当前位置:主页 > 文艺论文 > 汉语言论文 >

基于音节切分的维吾尔人名汉字音译研究与实现

发布时间:2018-05-17 05:05

  本文选题:维吾尔语 + 音节切分 ; 参考:《新疆师范大学》2014年硕士论文


【摘要】:维吾尔人名汉字音译是少数民族语言信息处理中需要解决的重要问题,并且在机器翻译、信息检索等应用中很重要的作用。近年来,因为新疆少数民族人名汉字音译转写缺少统一标准,维吾尔人名汉字音译转写时,在户口上是一种写法,在身份证上另一种写法,在护照上更不一样的用字写法、汇款单等又是一种写法。为此解决这些问题,本文主要对基于字形的DOM音译框架及维吾尔语音节分解的相关问题进行了较全面的分析,并在此基础上针对维吾尔人名汉字音译问题进行研究,论文的主要内容包括以下几个方面: 1.本文首先介绍了基于字形的DOM音译框架,探讨了维吾尔人名汉字音译在该音译框架的可行性。可知,该音译框架将源语言中的字直接匹配到目标语言中的字的特点,并且维吾尔人名汉字音译,其实是维吾尔文字母或音节直接匹配到对应汉字的过程,因而充分利用该音译框架实现了维吾尔文字母及音节到汉字的映射。 2.本文在研究维吾尔语音节切分相关的理论和关键技术的基础上,总结了维吾尔语音节分解原理,并实现维吾尔语音节分解统计系统,对5000人名进行音节分解的统计,给出了维吾尔人名中常用音节分布情况,并提出了20个常用的构成维吾尔人名的音节。 3.在基于字形的框架下,设计出音节分切的维吾尔人名汉字音译的基本思想和总体框架,并在分析维吾尔人名汉字对音表结构的基础上,提出了维吾尔人名的字母或音节对汉字映射的最快、最有效的方法,基于矩阵的维吾尔人名对汉字映射的方法。实现了基于音节切分的维吾尔人名汉字音译系统,并对系统进行测试,使用5000个随机人名进行音译实验,得到了仅52%的准确率。 4.本文为提高音译准确率,通过对大量维吾尔人名进行调研,找出106构成维吾尔人名词缀,并构建基于人名词缀的补充规则,因而能够区分维吾尔人名性别。将规则用在维吾尔人名汉字音译系统,进行二次测试,音译准确率提高了30%,,最终达到了86%音译准确率,从而显示了本文提出的方法和规则的可行性,有效性。
[Abstract]:The transliteration of Uygur names is an important problem in the information processing of minority languages and plays an important role in the applications of machine translation and information retrieval. In recent years, because of the lack of a unified standard for transliteration and writing of ethnic minority names in Xinjiang, the transliteration of Uygur names is one form of writing on the hukou, another on the identity card, and a more different way of writing in the passport. Money order and so on is another way of writing. In order to solve these problems, this paper makes a comprehensive analysis of the DOM transliteration framework based on glyph and the syllable decomposition of Uygur language, and on this basis studies the transliteration of Uygur names. The main contents of the thesis include the following aspects: 1. This paper first introduces the DOM transliteration framework based on glyph, and discusses the feasibility of Uygur character transliteration. It can be seen that the transliteration frame directly matches the characters in the source language to the characters in the target language, and the transliteration of Uygur names is actually the process of directly matching the Uygur letters or syllables to the corresponding Chinese characters. Therefore, the transliteration framework is used to realize the mapping of Uygur letters and syllables to Chinese characters. 2. On the basis of studying the theory and key technology of Uygur syllable segmentation, this paper summarizes the principle of Uygur syllable decomposition, and realizes the Uygur syllable decomposition statistical system. The distribution of common syllable in Uygur names is given, and 20 syllables that constitute Uyghur names are put forward. 3. Based on the framework of glyph, this paper designs the basic idea and the overall frame of the transliteration of Uygur names, and analyzes the structure of the phonetic table of Uygur names. This paper puts forward the fastest and most effective method of mapping Uygur names to Chinese characters with letters or syllables, and the method of mapping Uygur names to Chinese characters based on matrix. A Chinese character transliteration system based on syllable segmentation is implemented, and the system is tested, and only 52% of the accuracy is obtained by using 5000 random names in the transliteration experiment. 4. In order to improve the accuracy of transliteration, this paper investigates a large number of Uygur names, finds out 106 Uygur affixes, and constructs supplementary rules based on suffixes, so as to distinguish the gender of Uighur names. The rules are used in the Uygur system of transliteration of Chinese characters. The transliteration accuracy rate is increased by 30%, and the accuracy of transliteration is 86%, which shows the feasibility and effectiveness of the method and rule proposed in this paper.
【学位授予单位】:新疆师范大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:H215

【参考文献】

相关期刊论文 前2条

1 申文明;刘连芳;黄家裕;温家凯;;基于概率模型的汉语和越南语的人名音译方法[J];广西科学院学报;2010年04期

2 艾山·吾买尔;吐尔根·伊布拉音;;英文维文人名机器翻译算法的研究与实现[J];新疆大学学报(自然科学版);2007年01期



本文编号:1900024

资料下载
论文发表

本文链接:https://www.wllwen.com/wenyilunwen/hanyulw/1900024.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户479b2***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com