基于汉越双语平行语料库的词对齐方法研究
[Abstract]:In recent years, machine translation is becoming an important means to overcome the language barriers that people face in communication. The study of double word alignment is the basic link of automatic acquisition of translation knowledge, especially in the field of machine translation, word alignment is a valuable source of translation knowledge. It provides important support for the research of natural language processing such as Chinese-Vietnamese dictionary compilation, machine translation, speech recognition, information retrieval, semantic disambiguation and bilingual sentence alignment system. This makes people more and more aware of the importance of acquiring bilingual word alignment data. The research on how to improve the quality of Chinese-Vietnamese bilingual word alignment on the basis of predecessors and to construct a large-scale Chinese-Vietnamese bilingual word alignment corpus has certain academic value. At present, Chinese-English, French-English and other major languages have achieved good results in word alignment, but word alignment between Chinese and Vietnamese is rare. This paper probes into the reasons that affect the quality of Chinese-Vietnamese bilingual word alignment and analyzes the problems existing in the alignment process. At the same time, on the basis of combining the linguistic characteristics of the Vietnamese language and the existing research work, The main works are as follows: (1) A Chinese-Vietnamese bilingual word alignment method based on chunks is proposed. In order to improve the accuracy of Chinese-Vietnamese bilingual word alignment and to alleviate the asymmetric problem in the process of Chinese-Vietnamese bilingual word alignment, a Chinese-Vietnamese bilingual block alignment corpus is constructed, which is based on the block alignment corpus. According to the characteristics of Chinese and Vietnamese bilingualism, CRFs model is used to realize word alignment within blocks. (2) A Chinese-Vietnamese bilingual word alignment algorithm is proposed, which combines semantic information. Due to the problem of high error rate of low frequency word alignment in the alignment process, a lexical similarity model is proposed. In the monolingual corpus, we use neural network model to train word similarity model, and extend IBM word alignment model by word similarity model. Finally, the lexical alignment between Chinese and Vietnamese is realized by using GIZA which combines lexical similarity model. (3) combining semantic information, word2vec word alignment model and three word alignment models based on chunks are proposed based on the idea of integrated learning. They are regarded as independent alignment classifiers, and the strategies of simple voting and weighted voting are used to fuse multiple word alignment models to further improve the quality of word alignment and to evaluate and study three different word alignment methods.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 刘艳超;郭剑毅;余正涛;周兰江;严馨;陈秀琴;;融合实体特性识别越南语复杂命名实体的混合方法[J];智能系统学报;2016年04期
2 李英;郭剑毅;余正涛;毛存礼;线岩团;;越南语短语树到依存树的转换研究[J];计算机科学与探索;2017年04期
3 莫媛媛;郭剑毅;余正涛;毛存礼;牛翊童;;基于深层神经网络(DNN)的汉-越双语词语对齐方法[J];山东大学学报(理学版);2016年01期
4 李发杰;余正涛;郭剑毅;李英;周兰江;;借助汉-越双语词对齐语料构建越南语依存树库[J];中文信息学报;2015年06期
5 刘颖;姜巍;;一种基于改进隐马尔克夫模型的词语对齐方法[J];中文信息学报;2014年02期
6 潘清清;周枫;余正涛;郭剑毅;线岩团;;基于条件随机场的越南语命名实体识别方法[J];山东大学学报(理学版);2014年01期
7 张贯虹;乌达巴拉;巩政;;基于判别式模型的蒙英词对齐方法[J];模式识别与人工智能;2012年03期
8 任志敏;蔡东风;尹宝生;;一种高效的基于启发式规则和词典相结合的双语词对齐方法[J];沈阳航空工业学院学报;2010年05期
9 刘群;;机器翻译研究新进展[J];当代语言学;2009年02期
10 张孝飞;陈肇雄;黄河燕;王建德;;基于锚点词对的双语词对齐算法[J];小型微型计算机系统;2006年02期
相关博士学位论文 前1条
1 杨南;基于神经网络学习的统计机器翻译研究[D];中国科学技术大学;2014年
相关硕士学位论文 前3条
1 莫媛媛;汉越双语词语对齐方法研究[D];昆明理工大学;2015年
2 潘清清;越南语新闻事件元素抽取方法研究[D];昆明理工大学;2014年
3 李涛;基于半监督技术的集成分类研究[D];西北农林科技大学;2009年
,本文编号:2429178
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2429178.html