基于计算机自动句法分析的汉—英与汉—德数词机器翻译算法的设计与实现
发布时间:2018-08-28 16:22
【摘要】:本论文首先介绍了数词机器翻译研究的进展,并介绍了汉语、英语和德语数词的构造规律。接着,论文参照生成语法中的X-bar理论,提出了一个用于表示现代汉语数词内部结构的句法模型,并基于该模型及汉语数词中系数词和位数词的分布状况,用上下文无关文法设计了一套供CYK句法分析算法使用的描写汉语数词结构的规则库。句法分析在该机器翻译系统中主要为判定汉语数词中的“零”对应多少阿拉伯数字“0”而服务。为了弥补规则库设计不足导致句法歧义从而影响“零”判定的问题,本论文采用“分而治之”的策略,实现了可以将汉语数词自动翻译为相应的英语数词和德语数词的算法,最后使用Python语言实现并测试了这一算法,并将该算法的翻译结果与百度、谷歌与有道的在线翻译结果进行了对比,提出了该系统仍然需要改进的四个方面。本论文提出的多语种数词翻译算法的架构具备可扩展性,可以在后续的开发过程中根据需求添加模块,完成任意两指定语种之间的数词翻译。
[Abstract]:This thesis first introduces the research progress of numerals machine translation, and introduces the construction rules of numerals in Chinese, English and German. Then, referring to the X-bar theory in generative grammar, this paper proposes a syntactic model to represent the internal structure of numerals in modern Chinese, and based on the model and the distribution of coefficient words and digit words in Chinese numerals. In this paper, a set of rules for describing the structure of Chinese numerals for CYK syntactic analysis algorithm is designed with context-free grammar. In this machine translation system, syntactic analysis is mainly used to determine how many Arabic digits "0" correspond to "zero" in Chinese numerals. In order to remedy the problem that the lack of rule base design leads to syntactic ambiguity and thus affects the "zero" decision, this paper adopts the strategy of "divide and conquer" and realizes an algorithm that can automatically translate Chinese numerals into corresponding English numerals and German numerals. Finally, the algorithm is implemented and tested in Python language, and the translation results of the algorithm are compared with those of Baidu, Google and youdao, and four aspects of the system that need to be improved are put forward. The framework of the multilingual numerals translation algorithm proposed in this paper is extensible, and it can add modules according to the requirements in the subsequent development process to complete the numeral translation between any two specified languages.
【学位授予单位】:上海外国语大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:H085.3
本文编号:2209937
[Abstract]:This thesis first introduces the research progress of numerals machine translation, and introduces the construction rules of numerals in Chinese, English and German. Then, referring to the X-bar theory in generative grammar, this paper proposes a syntactic model to represent the internal structure of numerals in modern Chinese, and based on the model and the distribution of coefficient words and digit words in Chinese numerals. In this paper, a set of rules for describing the structure of Chinese numerals for CYK syntactic analysis algorithm is designed with context-free grammar. In this machine translation system, syntactic analysis is mainly used to determine how many Arabic digits "0" correspond to "zero" in Chinese numerals. In order to remedy the problem that the lack of rule base design leads to syntactic ambiguity and thus affects the "zero" decision, this paper adopts the strategy of "divide and conquer" and realizes an algorithm that can automatically translate Chinese numerals into corresponding English numerals and German numerals. Finally, the algorithm is implemented and tested in Python language, and the translation results of the algorithm are compared with those of Baidu, Google and youdao, and four aspects of the system that need to be improved are put forward. The framework of the multilingual numerals translation algorithm proposed in this paper is extensible, and it can add modules according to the requirements in the subsequent development process to complete the numeral translation between any two specified languages.
【学位授予单位】:上海外国语大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:H085.3
【参考文献】
相关期刊论文 前9条
1 孙茂松;周建设;;从机器翻译历程看自然语言处理研究的发展策略[J];语言战略研究;2016年06期
2 张磊;杨雅婷;米成刚;李晓;;维吾尔语数词类命名实体的识别与翻译[J];计算机应用与软件;2015年08期
3 乌丹牧其尔;王斯日古楞;;蒙汉机器翻译中的数词自动翻译研究[J];内蒙古师范大学学报(自然科学汉文版);2015年03期
4 冯志伟;冯绍锋;;第一次机器翻译试验的前前后后——纪念机器翻译60周年[J];现代语文(语言研究版);2014年08期
5 孙萌;华却才让;刘凯;吕雅娟;刘群;;藏文数词识别与翻译[J];北京大学学报(自然科学版);2013年01期
6 张秉权,刘正东,黄河燕;机器翻译中数字和数词相关表达形式的词法分析技术[J];计算机工程与应用;2002年18期
7 詹卫东,常宝宝,俞士汶;汉语短语结构定界歧义类型分析及分布统计[J];中文信息学报;1999年03期
8 郭宏蕾,姚天顺;数词的语义结构及通用翻译算法[J];中文信息学报;1996年04期
9 李竹;多语种(英法俄德日汉)数词的自动翻译[J];中文信息学报;1989年01期
相关会议论文 前1条
1 陈鄞;赵铁军;吕雅娟;于浩;;汉英机器翻译中数词的识别和翻译[A];机器翻译研究进展——2002年全国机器翻译研讨会论文集[C];2002年
相关硕士学位论文 前1条
1 郑宏;汉英双向时间数字和数量词的识别与翻译技术[D];哈尔滨工业大学;2011年
,本文编号:2209937
本文链接:https://www.wllwen.com/wenyilunwen/yuyanyishu/2209937.html