网页中文简繁体即时翻译插件的设计与实现
发布时间:2018-04-19 23:03
本文选题:网页 + 简繁体 ; 参考:《内蒙古大学》2014年硕士论文
【摘要】:汉语言文字已经有几千年的历史,它不仅是中华文化的传承,更是华人之间乃至与世界各国之间交流的工具。由于历史原因,港台地区和大陆地区仍然使用不同的中文系统,这给两岸三地的中文信息的沟通与交流造成巨大的障碍。随着网络技术的发展,为了给信息交换构筑更好的平台,浏览器成为大家上网的必须软件。 本文以浏览器插件的形式实现网页中文简繁体转换,由于IE浏览器占有较高的市场份额,所以选择现在最为流行的IE1O。 常见的网页中文编码方式有GB2312、GBK、UTF-8、BIG-5,中国大陆多以GB2312、GBK、UTF-8为主,而台湾、香港及部分海外华人地区多采用BIG-5。其中,GB2312字符集仅包含简体字,GBK和UTF-8可以同时显示简体字和繁体字,BIG-5字符集仅包含繁体字。本文通过对各类编码方式的研究与分析,将网页中文的简繁体转换具体分为两大类:同种编码内的简繁体转换和不同编码之间的简繁体转换。不同字符之间的编码是不相同的,要实现不同字符之间的编码转换必须在这两个字符之间建立用于翻译的双向索引,这就需要借助一些现有的编码转换和简繁体转换工具进行查询和批量转换。考虑到转换效率的问题,本文采用将简体编码和繁体编码分开存储的方式,并采用高效的哈希算法进行查找替换。在插件注册好,当用户浏览网页时选择好网页要显示的中文方式后,系统会自动抓取网页文档内容、识别网页编码方式、自行判断简繁体转换方案并进行转换,最后再将翻译好的网页返回。
[Abstract]:The Chinese language has a history of thousands of years. It is not only the inheritance of Chinese culture, but also the tool of communication between Chinese and other countries. Due to historical reasons, different Chinese systems are still used in Hong Kong and Taiwan and the mainland, which creates a huge obstacle to the communication and exchange of Chinese information between the two sides of the Taiwan Strait and the mainland. With the development of network technology, in order to build a better platform for information exchange, browser becomes the necessary software for everyone to surf the Internet. This paper uses the browser plug-in to realize the conversion of traditional Chinese characters of web pages. Because IE browser has a high market share, we choose the most popular IE 1O. The common Chinese coding methods for web pages are GB2312 / GBKUF-8 / BIG-5, while in mainland China, GB2312 / GBK/ UUTF-8 is the main coding method, while BIG-5 is widely used in Taiwan, Hong Kong and some overseas Chinese regions. The GB 2312 character set contains only simplified characters GBK and UTF-8 which can display both simplified and traditional characters. The BIG-5 character set contains only traditional characters. Based on the research and analysis of all kinds of coding methods, this paper divides the simplified Chinese conversion of web pages into two categories: the conversion of simplified traditional Chinese within the same coding and the conversion of simplified and traditional Chinese between different codes. Encoding varies from character to character, and to achieve encoding conversion between different characters, you must establish a bidirectional index between the two characters for translation, This requires querying and batch conversion with some existing coding and simplified conversion tools. Considering the efficiency of conversion, the simplified and traditional coding are stored separately, and the efficient hash algorithm is used to find and replace them. After registration of the plug-in, when the user chooses the Chinese way to display the web page, the system automatically grabs the document content of the web page, recognizes the coding method of the web page, judges the simple and complicated conversion scheme and converts it by itself. Finally, the translated web page will be returned.
【学位授予单位】:内蒙古大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1
【参考文献】
相关期刊论文 前10条
1 王秀珍;;GBK内码转换的设计与实践[J];长春师范学院学报;2006年08期
2 肖民杰;;利用GB18030字库实现JIS X 0208编码的日文汉字及BIG5编码的汉字输出[J];福建电脑;2006年02期
3 冯霞;;中文繁简转换及其转换工具[J];电脑知识与技术(学术交流);2007年12期
4 王娟;郭永冲;王强;;基于BHO的网络隐蔽通道研究[J];计算机工程;2009年05期
5 鹿文鹏,薛若娟;Unicode与UTF-8编码转换方法研究[J];计算机时代;2005年09期
6 党春;段汕;;Internet Explorer插件开发技术研究[J];科技创业月刊;2007年03期
7 王立军;王晓明;吴健;;简繁对应关系与简繁转换[J];中文信息学报;2013年04期
8 徐研;张伟;;基于BHO的淘宝网账户自动登录系统研究与实现[J];软件导刊;2011年04期
9 辛春生,孙玉芳;简繁汉字转换系统的设计与实现[J];软件学报;2000年11期
10 成亚萍;张太忠;;基于GB2312国标码的Word文档零水印算法[J];计算机工程与设计;2009年21期
,本文编号:1775114
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1775114.html