融合主题的汉越机器翻译方法研究
发布时间:2018-05-17 22:43
本文选题:汉语-越南语 + 统计机器翻译 ; 参考:《昆明理工大学》2017年硕士论文
【摘要】:越南与我国云南、广西毗邻,在国家发展战略的带动下,越南与国内交流密切。汉越机器翻译可以推动两国在旅游、电子商务、科技等方面的合作。传统的统计翻译模型主要是计算源语言和目标语言的短语翻译概率、以及词汇翻译概率等。但是这些翻译概率并不能准确衡量源语言和目标语言在语义上的相似度,它们可能导致译文与原文语义上并不等价,甚至使译文出现严重的语义翻译错误。本文针对以上问题,以树到树的翻译模型为基础做了一系列研究,主要研究成果如下:(1)融合短语主题的树到树翻译模型。由于自然语言的复杂性,目前的汉越机器翻译很难处理好领域歧义词的问题。为了提高汉越翻译的质量,本文提出了短语话题语义翻译模型,在树到树的解码过程中,替代原有的特征函数短语翻译概率,利用短语与它所在句子主题的分布关系来约束短语的选择。这种融合短语与主题关系的机器翻译方法在一定程度上能达到领域自适应的目的。对比实验结果表明,在一定规模领域语料的支持上,融合短语主题的汉越机器翻译显著改善了领域歧义词的翻译效果。(2)融合句子连贯性模型的树到树翻译模型。目前的汉越机器翻译基本是以单个句子为单位进行翻译建模,忽略了篇章层面的丰富信息,并不符合人类的翻译习惯。本文针对汉越篇章翻译时,跨句子篇章结构信息缺失的问题,在句子级层面进行翻译建模,提出了句子连贯性的翻译模型,使用话题的平滑迁移来来表示句子的连贯性,解决了连贯性的定量描述和计算问题。通过工具构建源语言文档的连贯性链,并将此链映射到目标端,进而利用映射得到的连贯性链约束译文选择。实验表明,融合连贯性的汉越机器翻译在进行篇章翻译时,能大幅度的提高篇章译文的连贯性。(3)融合主题的汉-越统计机器翻译原型系统。在开源机器翻译系统Niutrans的基础上,我们参考对数线性模型,将短语主题模型和句子连贯性模型融合到汉越树到树翻译系统中,然后使用现有的一些基础开源工具,在Linux平台上开发,以JavaWeb的形式,前端使用JSP开发展示层,框架采用比较简洁的servlet,后端调用机器翻译的接口,搭建了融合主题的汉-越于统计机器翻译原型系统。
[Abstract]:Vietnam is adjacent to Yunnan and Guangxi. Sino-Vietnamese machine translation can promote cooperation in tourism, e-commerce, science and technology. The traditional statistical translation models are mainly used to calculate the phrase translation probability and lexical translation probability of the source language and the target language. However, these translation probabilities can not accurately measure the semantic similarity between the source language and the target language. They may lead to the semantic equivalence between the source language and the original text, and even make the translation appear serious semantic translation errors. Based on the tree-to-tree translation model, this paper makes a series of researches on the above problems. The main research results are as follows: 1) Tree-to-Tree Translation Model which integrates phrase theme. Due to the complexity of natural language, Chinese-Vietnamese machine translation is very difficult to deal with the problem of domain ambiguity. In order to improve the quality of Chinese-Vietnamese translation, this paper proposes a phrase topic semantic translation model, which replaces the original feature function phrase translation probability in the tree-to-tree decoding process. The choice of phrase is restricted by the distribution of phrase and its sentence theme. To a certain extent, this method can achieve the purpose of domain adaptation. The contrastive experimental results show that the Chinese-Vietnamese machine translation with phrase themes can significantly improve the translation effect of domain ambiguity words. (2) the tree to tree translation model of sentence coherence model is fused with sentence coherence model. At present, Chinese and Vietnamese machine translation models are based on a single sentence, ignoring the abundant information at the text level, which is not in line with human translation habits. In order to solve the problem of the lack of cross-sentence structure information in Chinese-Vietnamese text translation, this paper models the translation of sentence coherence at sentence level, and proposes a sentence coherence translation model, in which the smooth transfer of topic is used to represent sentence coherence. The problem of quantitative description and calculation of coherence is solved. The coherence chain of the source language document is constructed by the tool and mapped to the target, and then the translation selection is constrained by the mapping coherence chain. The experimental results show that the coherence of Chinese and Vietnamese machine translation can greatly improve the coherence of the text translation in the process of text translation.) the prototype system of Chinese-Vietnamese statistical machine translation is integrated with the topic. Based on the open source machine translation system (Niutrans), we use the logarithmic linear model to integrate the phrase topic model and sentence coherence model into the Sino-Vietnamese tree to tree translation system, and then use some basic open source tools. Developed on the Linux platform, in the form of JavaWeb, the front-end uses JSP to develop the display layer, the framework adopts the simpler servlet, and the back-end calls the interface of machine translation, the prototype system of Chinese-Yueyu statistical machine translation is built.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.2
【参考文献】
相关期刊论文 前3条
1 张冬梅;刘小蝶;晋耀红;;基于模板的汉英专利机器翻译研究[J];计算机应用研究;2013年07期
2 杨林坤;;国家实施桥头堡战略对云南纤检的机遇与挑战[J];中国纤检;2011年05期
3 刘群;统计机器翻译综述[J];中文信息学报;2003年04期
相关博士学位论文 前1条
1 肖桐;树到树统计机器翻译优化学习及解码方法研究[D];东北大学;2012年
,本文编号:1903227
本文链接:https://www.wllwen.com/jingjilunwen/dianzishangwulunwen/1903227.html