改进机器翻译中的句子切分模型
发布时间:2018-09-05 07:19
【摘要】:随着统计机器翻译系统训练语料的不断增加,长句子的数量越来越多,如何有效地利用长句子中的信息改进翻译质量是统计机器翻译系统面临的主要问题之一。该文基于Xu的句子切分模型,提出了一种在训练阶段切分长句子的方法,该方法利用自动获取的边界词概率和切分后子句对的长度比例来指导切分过程,从而得到更符合语义信息的句子切分结果。在NIST测试集上的实验结果表明,该方法获得了最大0.5个BLEU值的提升。
[Abstract]:With the increasing number of statistical machine translation system training materials, the number of long sentences is increasing. How to effectively use the information in long sentences to improve the translation quality is one of the main problems facing the statistical machine translation system. Based on Xu's sentence segmentation model, this paper proposes a method of segmenting long sentences in training stage. The method uses the probability of boundary words obtained automatically and the length ratio of clause pairs after segmentation to guide the segmentation process. In order to obtain more semantic information of sentence segmentation results. The experimental results on the NIST test set show that the method achieves a maximum improvement of 0.5 BLEU.
【作者单位】: 东芝(中国)研究开发中心;
【分类号】:H085
本文编号:2223576
[Abstract]:With the increasing number of statistical machine translation system training materials, the number of long sentences is increasing. How to effectively use the information in long sentences to improve the translation quality is one of the main problems facing the statistical machine translation system. Based on Xu's sentence segmentation model, this paper proposes a method of segmenting long sentences in training stage. The method uses the probability of boundary words obtained automatically and the length ratio of clause pairs after segmentation to guide the segmentation process. In order to obtain more semantic information of sentence segmentation results. The experimental results on the NIST test set show that the method achieves a maximum improvement of 0.5 BLEU.
【作者单位】: 东芝(中国)研究开发中心;
【分类号】:H085
【相似文献】
相关期刊论文 前1条
1 冯志伟;;《统计机器翻译》述评[J];外语教学与研究;2013年04期
相关会议论文 前2条
1 付雷;吕雅娟;刘群;;基于句型模板和统计机器翻译技术的翻译方法[A];内容计算的研究与应用前沿——第九届全国计算语言学学术会议论文集[C];2007年
2 柴春光;宗成庆;;影响统计翻译系统性能的因素分析[A];第三届学生计算语言学研讨会论文集[C];2006年
相关硕士学位论文 前1条
1 修驰;统计机器翻译语料预处理中的问题研究[D];北京语言大学;2009年
,本文编号:2223576
本文链接:https://www.wllwen.com/wenyilunwen/yuyanyishu/2223576.html