当前位置:主页 > 文艺论文 > 语言艺术论文 >

多领域机器翻译中的非参贝叶斯短语归纳

发布时间:2018-05-15 15:17

  本文选题:多领域机器翻译 + 非参贝叶斯 ; 参考:《哈尔滨工程大学学报》2017年10期


【摘要】:多领域机器翻译一直以来都是机器翻译领域研究的重点,而短语归纳是重中之重。传统加权的方法并没有考虑到整个归约过程,本文提出了一种使用层次化的Pitman Yor过程进行短语归约,同时把多通道引入到模型中,使得在短语归约的过程中平衡各领域的影响;从模型角度,本文的方法为生成式模型,模型更有表现力,且把对齐和短语抽取一起建模,克服了错误对齐对原有短语抽取性能的影响。从复杂度上来说,该模型独立于解码,更易于训练;从多领域融合来说,对短语归约过程中进行融合,更好地考虑到整个归约过程。在两种不同类型的语料上验证了机器翻译的性能,相对于传统的单领域启发式短语抽取和多领域加权,BLEU分数有所提高。
[Abstract]:Multi-domain machine translation has always been the focus of machine translation research, and phrase induction is the most important. The traditional weighting method does not consider the whole process of reduction. This paper proposes a hierarchical Pitman Yor process for phrase reduction, and introduces multi-channel into the model to balance the influence of various fields in the process of phrase reduction. From the point of view of model, the method of this paper is generative model, the model is more expressive, and the alignment and phrase extraction are modeled together, which overcomes the influence of error alignment on the performance of original phrase extraction. In terms of complexity, the model is independent of decoding and is easier to be trained. In terms of multi-domain fusion, the process of phrase reduction is fused and the whole process of reduction is better considered. The performance of machine translation is verified on two different types of corpus, which is improved compared with the traditional single-domain heuristic phrase extraction and multi-domain weighted Bleu score.
【作者单位】: 哈尔滨理工大学软件学院;哈尔滨工程大学计算机科学与技术学院;哈尔滨工业大学计算机学院;
【基金】:国家自然科学青年基金项目(61300115) 中国博士后科学基金项目(2014M561331) 黑龙江省教育厅科技研究项目(12521073)
【分类号】:H085


本文编号:1892865

资料下载
论文发表

本文链接:https://www.wllwen.com/wenyilunwen/yuyanyishu/1892865.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户70771***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com