融合统计机器翻译特征的蒙汉神经网络机器翻译技术

发布时间:2019-01-25 21:12
【摘要】:随着机器翻译的发展,统计机器翻译已经进入瓶颈期很难有所提高,因此研究人员逐步将研究目光投向神经网络机器翻译方向。神经网络机器翻译也在大规模语料上取得了很好的翻译效果,而对小规模语料的神经网络机器翻译研究甚少。但是作为新的机器翻译方法它也存在一些限制:(1)神经网络机器翻译为了降低训练的复杂度通常会将词典的大小限制到一个特定的范围内,从而导致严重的未登录词问题,这个问题严重影响了翻译效果;(2)神经网络机器翻译的解码缺乏保证源语言词都被翻译的机制从而倾向于短的翻译结果;(3)神经网络机器翻译不能很好的利用语言模型。基于以上原因,本文在小规模的蒙汉平行语料上实现了神经网络机器翻译,并提出通过统计机器翻译中的特征来缓解神经网络机器翻译中的问题。首先本文搭建了基于注意力的蒙汉神经网络机器翻译系统;其次,本文提取了统计机器翻译特征:翻译模型,词反馈信息以及语言模型,并定义了其特征函数;第三,本文通过蒙汉平行语料利用GIZA++建立了蒙汉对齐词典,利用IRSTLM对汉文建立了语言模型;第四,本文将已经建立的蒙汉对齐词典、语言模型以及词反馈信息通过对数线性模型融入到基于注意力的神经网络机器翻译的解码中从而处理神经网络机器翻译中的限制;最后,本文针对神经网络机器翻译中的未登录词问题提出了在翻译过程中处理和翻译后处理的两种处理方法,大幅度减少了神经网络机器翻译中的未登录词。实验结果表明,通过融合统计机器翻译特征的蒙汉神经网络机器翻译明显地提升了翻译质量,BLEU值提高至30.66,句子长度由16.7个词提升至19.1个词,并处理掉了神经网络机器翻译中86%的未登录词。
[Abstract]:With the development of machine translation, it is difficult to improve statistical machine translation in the bottleneck period. Therefore, researchers have gradually turned their attention to neural network machine translation. Neural network machine translation has also achieved good results on large scale corpus, but there is little research on neural network machine translation of small scale corpus. However, as a new machine translation method, it also has some limitations: (1) in order to reduce the complexity of training, neural network machine translation usually limits the size of the dictionary to a specific range, resulting in a serious problem of unrecorded words. This problem seriously affects the translation effect; (2) the decoding of neural network machine translation lacks the mechanism to ensure that all the source language words are translated, so it tends to short translation results; (3) neural network machine translation can not make good use of the language model. For the above reasons, this paper implements neural network machine translation on a small scale Mongolian and Chinese parallel corpus, and proposes to solve the problems in neural network machine translation by statistical machine translation features. Firstly, this paper builds an attention-based Mongolian and Chinese neural network machine translation system; secondly, this paper extracts the statistical machine translation features: translation model, word feedback information and language model, and defines its feature function. Thirdly, this paper uses the Mongolian and Chinese parallel corpus to build the Mongolian and Chinese alignment dictionary by using GIZA, and uses IRSTLM to build a language model for Chinese. Fourthly, this paper integrates the established Mongolian and Chinese alignment dictionaries, language models and word feedback information into the decoding of attention based neural network machine translation to deal with the limitations of neural network machine translation. Finally, this paper proposes two methods to deal with the problem of unrecorded words in neural network machine translation, which greatly reduce the number of unrecorded words in neural network machine translation. The experimental results show that the Mongolian and Chinese neural network machine translation, which combines statistical machine translation features, can significantly improve the translation quality. The BLEU value is increased to 30.66, and the sentence length is raised from 16.7 words to 19.1 words. In addition, 86% of the unrecorded words in neural network machine translation were eliminated.
【学位授予单位】:内蒙古大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.2

【参考文献】

相关期刊论文 前9条

1 刘群;;基于句法的统计机器翻译模型与方法[J];中文信息学报;2011年06期

2 奉国和;郑伟;;国内中文自动分词技术研究综述[J];图书情报工作;2011年02期

3 赵伟;侯宏旭;从伟;宋美娜;;基于条件随机场的蒙古语词切分研究[J];中文信息学报;2010年05期

4 侯宏旭;张国强;刘志文;;层次化蒙古语统计语言模型[J];内蒙古大学学报(自然科学版);2009年03期

5 刘群;;机器翻译研究新进展[J];当代语言学;2009年02期

6 侯宏旭;刘群;那顺乌日图;牧仁高娃;李锦涛;;基于统计语言模型的蒙古文词切分[J];模式识别与人工智能;2009年01期

7 侯宏旭;刘群;刘志文;张国强;;Skip-N蒙古文统计语言模型[J];内蒙古大学学报(自然科学版);2008年02期

8 黄昌宁;赵海;;中文分词十年回顾[J];中文信息学报;2007年03期

9 那顺乌日图;蒙古文词根、词干、词尾的自动切分系统[J];内蒙古大学学报(人文社会科学版);1997年02期

相关会议论文 前1条

1 申晓亭;;少数民族文字拉丁转写的意义与方案[A];第十届全国少数民族语言文字信息处理学术研讨会论文集[C];2005年

相关硕士学位论文 前1条

1 明玉;基于词典、规则与统计的蒙古文词切分系统的研究[D];内蒙古大学;2011年



本文编号:2415194

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/2415194.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户fc9f3***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com