冶金领域汉越机器翻译方法研究
本文选题:机器翻译 + 汉语-越南语 ; 参考:《昆明理工大学》2016年博士论文
【摘要】:机器翻译是跨语言信息交流最有效的方式,随着“一带一路”国家战略的实施,汉越机器翻译变得越来越重要。中国与越南在冶金行业有着大量合作,在冶金领域文本、科技文献、行业信息等有大量的翻译需求,对这些信息进行自动翻译对推动汉越双边冶金行业信息国际合作交流有着重要的意义。当前汉越机器翻译方面的研究工作还相对比较薄弱,尤其在特定领域的机器翻译研究工作更有限,严重制约了面向行业的跨语言信息交流。汉越语言本身存在很大差异,特定行业的翻译同时还具有很多领域特点,传统的翻译方法还不能完全适应面向冶金领域的汉越机器翻译,其面临双语领域术语获取、双语词对齐自动标注、适应于汉越语言差异特性及领域特性的机器翻译问题,结合汉越语言差异及冶金领域特性,本文开展汉越冶金领域机器翻译关键技术及方法的探讨,围绕冶金领域汉越双语术语获取、汉越双语词对齐、融合语言差异的树到树句法统计机器翻译、融合领域特性的句法统计机器翻译等关键技术展开研究,主要取得了以下创新性成果:(1)针对汉-越领域语料库稀缺而导致双语术语难于获取的问题,提出了基于枢轴语言的冶金领域双语术语自动获取方法,借助于已有的汉英、英越双语对照领域文本及科技文献,采用条件随机场模型在源语言端对汉语领域文本进行术语识别,然后,基于短语的统计机器翻译思想,构建汉语-英语短语概率表、英语-越南语短语概率表,借助枢轴的思想,通过英语枢轴的映射,获得汉语到越南语的短语概率表,并利用中文领域术语过滤汉-越短语表,构建汉-越冶金领域双语术语库。实验证明提出方法取得了很好的术语抽取效果,在汉越双语对齐资源稀缺的情况下,有效解决了汉越冶金领域双语术语抽取难的问题。(2)针对汉越词对齐自动标注问题,提出融合语言差异特性及深度学习的汉越词对齐方法,结合汉越在定语后置、状语后置和语言结构位置上的差异特点,定义语言位置转换函数及结构调整函数,并将这些函数作为约束,将语言结构差异特性融合到双向RNN学习的损失函数中,以此提升双语词对齐学习的性能及精度。汉越双语词对齐实验结果表明,提出的方法表现出很好的效果,语言特性及双向上下文信息能够有效提升词对齐效果。(3)针对汉越语言差异特点,提出了融合语言特点的汉越树到树统计机器翻译方法。语言差异特性对机器翻译有很好作用,分析汉越语言差异,定义汉越语言差异化规则,定义了定语后置奖励、时间状语后置奖励、地点状语后置奖励等语言特征,借助汉越双语词对齐语料,在模板抽取时,将语言差异特征融合到树到树翻译规则抽取过程,在解码过程中,利用语言差异规则对候选句子进行剪枝和优化,获取最优翻译序列,提高模板抽取及解码的效率和精度。汉越双语句子翻译实验结果表明提出的方法取得了很好的效果,句法差异特性的利用能够有效提升翻译的性能和精度。(4)为提升领域文本翻译效果,提出了融合领域特性的汉越句法统计机器翻译方法,分析了领域特点及其对机器翻译的影响关系,借助领域术语及语料,构建双语术语-主题分布模型、段落领域主题连贯性模型、及基于Freebase的领域知识模型,在融合语言特点的树到树的翻译模型中,将双语领域术语库、双语术语-主题概率分布、段落领域连贯性及领域知识关系应用到候选翻译的选择、组合及剪枝优化等解码过程中,从而更有效利用领域特性提升领域翻译效果。冶金领域汉越翻译实验结果表明提出的方法取得很好的效果,领域主题、段落主题连贯性、领域知识对领域文本翻译具有明显提升效果。
[Abstract]:Machine Translation is the most effective way of cross language information exchange. With the implementation of the national strategy of "one area and one road", Han Yue Machine Translation becomes more and more important. There is a great deal of cooperation between China and Vietnam in the metallurgical industry. There are a lot of translation needs in the text of metallurgy, scientific literature, industry information and so on, and the information is translated automatically. It is of great significance to promote the international cooperation and exchange of information between the Han and Vietnam bilateral metallurgical industries. The research work of the Han and Vietnamese Machine Translation is relatively weak, especially in the specific field of Machine Translation research, which seriously restricts the cross language information exchange for the industry. There are great differences in the language of the Han Dynasty and Vietnam. The translation of the industry is also characterized by many fields. The traditional translation method can not be fully adapted to the Machine Translation in the field of metallurgy. It is faced with the acquisition of bilingual terminology, the automatic tagging of bilingual word alignment, the Machine Translation problem adapted to the differences and domain characteristics of the Han Yue language, combining the differences of the Chinese and Vietnamese language and the metallurgical collar. In this paper, the key technologies and methods of Machine Translation in the area of Han Yue metallurgy are discussed in this paper. This paper focuses on the study of the key technologies, such as the acquisition of Sino Vietnamese bilingual terminology, the alignment of Chinese and Vietnamese bilingual words, the tree to the tree syntactic statistics Machine Translation, the syntactic statistics of the domain characteristics of the syntactic statistics Machine Translation and other key technologies. Innovative achievements: (1) in view of the problem that the Chinese and Vietnamese corpus are scarce and the bilingual terminology is difficult to obtain, the automatic acquisition method of bilingual terminology in metallurgical field based on pivot language is proposed, with the help of the existing Chinese English, English and Vietnamese bilingual contrast domain text and scientific literature, the conditional random field model is used in the source language to the Chinese domain. The text carries out the terminology recognition, and then, based on the phrase - based statistical Machine Translation thought, the Chinese - English phrase probability table is constructed, the English - Vietnamese phrase probability table is used to obtain the phrase probability table of Chinese to Vietnamese by the mash of the pivot, and the Chinese Vietnamese phrase table is used to construct Han Yue metallurgy with the Chinese domain terms. The bilingual terminology Library of the gold field has proved that the proposed method has achieved a good term extraction effect. In the case of scarcity of Chinese and Vietnamese bilingual align resources, the problem of bilingual terminology extraction in the Han Yue metallurgy field is effectively solved. (2) in view of the problem of automatic tagging in the alignment of the Chinese and Vietnamese words, the Chinese Vietnamese words with the characteristics of the language difference and the deep learning are put forward. In order to improve the performance and accuracy of the bilingual word alignment learning, the homogeneity method, combining with the differences of the postposition of the attributive, the postposition of adverbials and the position of the language structure, defines the position transformation function and the structural adjustment function of the language, and combines these functions as a constraint to integrate the linguistic structure difference into the loss function of the two-way RNN learning. The results of the bilingual word alignment show that the proposed method has a good effect. Language characteristics and two-way context information can effectively improve the effect of word alignment. (3) according to the characteristics of the Chinese and Vietnamese language differences, the Chinese Vietnamese tree to tree statistical Machine Translation method is proposed. The language difference characteristics have a good effect on the Machine Translation. This paper analyzes the differences between the Chinese and Vietnamese language, defines the Chinese Vietnamese language differentiation rules, defines the language characteristics of the attributive postposition reward, the time adverbial postposition reward, the place adverbial postposition reward and so on. With the help of the Chinese and Vietnamese bilingual words, the language difference features are fused to the tree to tree translation rule extraction process when the template is extracted. In the decoding process, the language is used in the decoding process. The difference rules are used to prune and optimize the candidate sentences, obtain the optimal translation sequence and improve the efficiency and accuracy of template extraction and decoding. The results of Chinese Vietnamese bilingual sentence translation experiments show that the proposed method has achieved good results. The use of syntactic differences can effectively improve the performance and accuracy of translation. (4) to improve the domain text In translation effect, the Chinese Vietnamese syntactic statistics Machine Translation method, which combines the characteristics of the domain, is proposed, and the characteristics of the domain and its influence on Machine Translation are analyzed. With the use of domain terms and corpus, the bilingual terminology theme distribution model, the topic coherence model in the paragraph domain, and the domain knowledge model based on Freebase are used to fuse the language characteristics. In the tree to tree translation model, bilingual domain terminology database, bilingual term - topic probability distribution, paragraph domain coherence and domain knowledge relation are applied to the selection of candidate translation, combination and pruning optimization, so as to better use the domain characteristics to improve the translation effect of the domain. The method proposed by Ming has achieved good results, and the domain theme, paragraph theme coherence and domain knowledge have significant effect on the translation of domain texts.
【学位授予单位】:昆明理工大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:H44;TF0
【相似文献】
相关期刊论文 前10条
1 张敬国;万新梁;张景怀;汪礼敏;;铜在粉末冶金领域中的重要性[J];世界有色金属;2009年06期
2 ;“863”计划在冶金领域取得显著成果[J];矿业快报;2001年08期
3 马智明,徐荣军,姚忠卯,马林海;Data mining techniques在冶金领域的应用[J];河南冶金;2001年02期
4 ;一本关于粉末冶金领域的全面的工具书[J];粉末冶金技术;2008年02期
5 唐华生;传统粉末冶金领域的一些技术发展(上)[J];机械工程;1989年02期
6 陈深;;1985年度亚洲采矿会议[J];国外采矿技术快报;1985年09期
7 晓松;国内几个粉末冶金领域的相关网站[J];粉末冶金工业;2003年01期
8 廖际常;成果丰硕的粉末冶金研究基地[J];稀有金属材料与工程;1985年03期
9 张华;;POCHAHO公司同VSMPO-AVISMA公司加强冶金领域合作[J];中国钛业;2012年01期
10 ;新书征订[J];粉末冶金技术;1989年03期
相关会议论文 前2条
1 葛道才;郭雄军;;阴阳膜和双极膜在冶金领域的应用探讨[A];第四届全国膜分离技术在冶金工业中应用研讨会论文集[C];2014年
2 徐铜文;;我国分离膜发展的战略浅议及在冶金领域中应用前景展望[A];第四届全国膜分离技术在冶金工业中应用研讨会论文集[C];2014年
相关重要报纸文章 前6条
1 通讯员 尹欣欣;华油工建承建工程首获冶金领域优质奖[N];中国石油报;2009年
2 驻湖北记者 李文聪 通讯员 邝冬林 张珂斌;武汉科尔辊破机进军冶金领域[N];中国建材报;2007年
3 记者 徐刚;耐磨产品多项“扎根”冶金领域[N];中国冶金报;2004年
4 夏杰生;电磁冶金领域的全能专家[N];中国冶金报;2009年
5 记者 周炳文;微波技术新增产值近10亿[N];云南政协报;2011年
6 田庆华;高校冶金学院院长学术论坛举行[N];中国有色金属报;2007年
相关博士学位论文 前1条
1 高盛祥;冶金领域汉越机器翻译方法研究[D];昆明理工大学;2016年
,本文编号:2056183
本文链接:https://www.wllwen.com/shoufeilunwen/rwkxbs/2056183.html