融合互联网引擎的机器翻译系统
发布时间:2018-03-12 14:56
本文选题:机器翻译 切入点:系统融合 出处:《内蒙古大学》2017年硕士论文 论文类型:学位论文
【摘要】:机器翻译从出现到现在历经了几十年的发展,已经取得了令人瞩目的成果,期间各种方法不断被提出,目前主流的是基于统计的机器翻译以及最新的基于神经网络的机器翻译方法,各种机器翻译方法都有自己独特的优势,因此提出了系统融合方法来"取长补短",希望通过系统融合来优化翻译结果。目前,机器翻译在工业上的应用已经十分成熟,百度、有道和谷歌等都推出了在线互联网翻译系统,本次研究就是利用这些互联网翻译引擎以及利用Moses统计机器翻译模型训练出的系统来进行系统融合。系统融合按照操作基本操作单元的不同可以分为句子级、短语级和词汇级系统融合三种,本研究中进行了句子级和词汇级以及基于MEMT的三种融合方式,在汉英翻译任务上进行。句子级系统融合采用了最小贝叶斯风险解码的方法,在解码时使用了不同的损失函数,在使用TER作为损失函数时取得了最好的结果,比融合前的最好结果的BLEU得分提升了 0.24个点。在词汇级系统融合中需要构造混淆网络并解码来得到目标结果,研究中对构造混淆网络时采用的不同的词对齐方式以及解码时加入不同的特征进行了多组对比实验,结果表明基于TER并加入词干匹配的词对齐以及解码时加入多种有效特征可以提升系统融合的效果,这个实验也取得了本次研究的最好结果,比融合前最好结果的BLEU得分提升了 0.78个点,比融合前最差的系统提升了 3.01个点。基于MEMT的系统融合效果表现一般,比融合前最好结果的BLEU得分提升了 0.48个点。实验结果表明融合互联网引擎的机器翻译系统可以提升翻译的质量。研究最后实现了一个融合互联网翻译引擎的B/S模式的系统,采用的是词汇级的系统融合方式。
[Abstract]:Machine translation has been developed for several decades from its emergence to now, and has achieved remarkable results. During this period, various methods have been put forward. At present, the mainstream is statistically based machine translation and the latest machine translation method based on neural network. All kinds of machine translation methods have their own unique advantages. Therefore, a system fusion method is proposed to "learn from each other's weaknesses", hoping to optimize translation results through system fusion. At present, the application of machine translation in industry is very mature, Baidu, Youdao and Google have launched online Internet translation systems, This research is to use these Internet translation engines and the system trained by Moses statistical machine translation model to fuse the system. The system fusion can be divided into sentence level according to the different operation units. There are three fusion methods of phrase level and lexical level, sentence level and vocabulary level, and three fusion methods based on MEMT, which are used in Chinese-English translation task. Sentence level system fusion adopts the method of minimum Bayesian risk decoding. Different loss functions are used in decoding, and the best results are obtained when TER is used as a loss function. The BLEU score is 0.24 points higher than the best result before fusion. In lexical level system fusion, we need to construct a confusion network and decode to get the target result. In the study, the different word alignment methods used in the construction of confusion network and the addition of different features in decoding were compared with each other. The results show that word alignment based on TER and stem matching and several effective features in decoding can improve the effectiveness of system fusion. This experiment has also obtained the best results of this study. The BLEU score was 0.78 points higher than that of the best result before fusion, and 3.01 points higher than that of the worst system before fusion. The experimental results show that the machine translation system integrated with the Internet engine can improve the translation quality. Finally, a system integrating the Internet translation engine with the B / S model is implemented. The system fusion method of vocabulary level is adopted.
【学位授予单位】:内蒙古大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.2
【参考文献】
相关期刊论文 前4条
1 李茂西;宗成庆;;机器翻译系统融合技术综述[J];中文信息学报;2010年04期
2 杜金华;魏玮;徐波;;基于混淆网络解码的机器翻译多系统融合[J];中文信息学报;2008年04期
3 邢永康;马少平;;统计语言模型综述[J];计算机科学;2003年09期
4 陈小荷;自动分词中未登录词问题的一揽子解决方案[J];语言文字应用;1999年03期
,本文编号:1602062
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1602062.html