基于马尔科夫逻辑网的柬埔寨语复杂组织机构名识别

发布时间:2017-12-27 19:14

  本文关键词:基于马尔科夫逻辑网的柬埔寨语复杂组织机构名识别 出处:《昆明理工大学》2017年硕士论文 论文类型:学位论文


  更多相关文章: 柬埔寨语 Tri-training 特征选择 马尔科夫逻辑网 一阶逻辑


【摘要】:随着我国与柬埔寨国家的交流合作日益频繁,进行柬埔寨的自然语言处理工作变得尤为重要。由于不同语言之间存在较大的差异,因此,其他语言的命名实体识别方法无法直接移植到柬埔寨语中。为了提高柬埔寨语组织机构名识别的准确率,本文围绕柬埔寨语组织机构名识别模型构建,扩充组织机构名语料库等关键问题展开研究,并取得了以下成果:(1)提出了一种基于Tri-training的柬埔寨语组织机构名的识别方法。该方法首先利用改进的Tri-training算法,将基于条件随机场、支持向量机和最大熵模型三个不同的分类器组合成一个分类体系,然后利用少量的已标注语料,依据最优化样本选择策略对新加入样本进行选择,结合柬埔寨语的语言特点进行实验。结果表明该方法能够通过利用少量的已标注语料来实现对柬埔寨语组织机构名的识别。(2)提出了一种基于马尔科夫逻辑网的柬埔寨语复杂组织机构名识别方法。该方法首先采用条件随机场模型对简单的组织机构名进行识别,然后结合柬埔寨语的语言特点,得到一阶逻辑规则,将一阶逻辑规则融入到马尔科夫逻辑网中,并利用LazySAT推理算法来进行复杂组织机构名的识别。结果表明该方法能够使柬埔寨语复杂组织机构名达到更好的识别效果。(3)设计并实现了柬埔寨语组织机构名识别原型系统,为柬埔寨语命名实体识别的研究提供了有力支撑。
[Abstract]:With the increasingly frequent exchanges and cooperation between China and Kampuchea countries, Natural Language Processing work in Kampuchea is becoming more and more important. Because of the great difference between different languages, the method of naming entity recognition in other languages can not be directly transplanted into Kampuchea language. In order to improve the recognition accuracy of Kampuchea language organization names, this paper constructed around the Kampuchea language organization name recognition model, carried out research on key issues of extension organization name corpus, and has achieved the following results: (1) propose a recognition method based on the Kampuchea language organization name Tri-training. This method uses the improved Tri-training algorithm, the CRFs, support vector machine and maximum entropy model for three different classifiers are combined into a classification system based on corpus, and then use a small amount of sample selection, on the basis of the optimization strategy to select the newly added samples, combined with the linguistic features of Kampuchea language experiment. The results show that the method can realize the identification of the name of the Kampuchea language organization by using a small number of tagged corpus. (2) a method of identifying the name of Kampuchea language complex organization based on Markoff logic network is proposed. This method first uses conditional random field model of simple organization name recognition, and then combined with the linguistic features of Kampuchea language, get the first-order logic rules into first-order logic rules to Markov logic network, and to identify complex organizations using LazySAT inference algorithm. The results show that the method can make the name of Kampuchea language complex organization achieve a better recognition effect. (3) the prototype system of the name recognition of the Kampuchea language organization is designed and realized, which provides a strong support for the study of the name entity recognition of the Kampuchea language.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1;H613

【相似文献】

相关重要报纸文章 前2条

1 本报记者 杨玲;银悦西:将民歌传唱到东盟的使者[N];南宁日报;2008年

2 记者 李新雄 实习生 韦锦星 黄政合;推介东盟十国 学习东盟语言[N];广西日报;2004年

相关硕士学位论文 前4条

1 王若兰;基于马尔科夫逻辑网的柬埔寨语复杂组织机构名识别[D];昆明理工大学;2017年

2 李小龙(TRY RATANAK);柬埔寨语新闻评论文本情感分类研究[D];昆明理工大学;2017年

3 杨颖;柬埔寨语词缀研究[D];云南民族大学;2013年

4 潘华山;基于条件随机场的柬埔寨语词法分析方法研究[D];昆明理工大学;2014年



本文编号:1342840

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/zaizhiboshi/1342840.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户93d8f***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com