人名消歧关键技术研究与实现
发布时间:2018-03-29 08:16
本文选题:人名消歧 切入点:机构名识别 出处:《哈尔滨工业大学》2012年硕士论文
【摘要】:随着移动互联网时代的到来,网络使用的便捷性不断提高,终端数量不断增加,使得信息发布的速度加快,信息量飞速增长搜索与特定人物相关的信息是用户在互联网上进行搜索的主要目的之一,而重名现象的普遍性导致互联网文本中人名歧义现象严重通用搜索引擎返回的结果并不能针对歧义现象有效地组织信息,造成了用户耗费大量的时间从许多同名人物中筛选自己感兴趣的人物信息,且有遗漏重要信息的信息的风险因此,如何有效的消除这些歧义,把信息以有组织的形式呈现给用户,就成为一个非常重要的问题为此,本文进行了以下四个方面的工作: 第一,本文探讨了人工标注人名歧义语料的过程,并提出了基于自适应共振理论的两阶段消歧策略模仿这一过程:在第一阶段,构建代表人物的类别并对文档进行分类,在第二阶段通过层次凝聚的方法合并相似的类别系统通过类人行为,自动构建目标概念集合并实现歧义消解本文设计实验并验证了两阶段消歧策略的有效性,在两种人名识别结果上,本文的两阶段方法的性能比传统方法提高了0.92%和5.00% 第二,本文实现了人机互助的系统,,辅助建立识别规则和多种知识词典资源并利用这些资源和规则建立了机构名识别系统,通过与其他两种命名实体识别工具ISLEX和LTP的比较,证明了规则方法在人名消歧任务的识别要求中,具有较高的性能和效率,可以有效适用于人名消歧系统的实际应用 第三,本文对搜狗全网新闻语料进行了标注,得到了可用于互联网人名消歧研究的真实网络语料资源;分析了人物属性的对于互联网语料的重要性和各属性的特点;针对网络上的非结构化信息,设计并实现人物属性抽取系统;最后,通过在真实网络语料上的实验,对人物属性特征的有效性进行了验证 第四,本文分析了人名消歧系统的任务和功能,设计并实现了基于知识资源人名消歧模块,完成了页面爬取页面分析基于知识资源人名消歧数据存储等模块,实现了直观的消歧结果排序算法,建立了新闻检索结果消歧系统
[Abstract]:With the advent of the mobile Internet era, the convenience of network use has been improved, and the number of terminals has been increasing, which makes the speed of information release accelerated. The rapid growth of information is one of the main purposes for users to search the Internet for information related to a particular person. However, the universality of the phenomenon of double names leads to the serious phenomenon of name ambiguity in Internet texts. The results returned by the common search engine can not organize information effectively against the phenomenon of ambiguity. It causes users to spend a lot of time to filter the information of people of interest from many people of the same name, and there is the risk of omitting important information. Therefore, how to effectively eliminate these ambiguities, Presenting information to users in organized form becomes a very important issue. For this reason, this paper has carried out the following four aspects of work:. First, this paper discusses the process of manually tagging the ambiguous corpus of human names, and proposes a two-stage disambiguation strategy based on adaptive resonance theory to imitate this process: in the first stage, the categories of representative persons are constructed and the documents are classified. In the second stage, the similar category system is merged by hierarchical aggregation, and the target concept set is automatically constructed by humanoid behavior. In this paper, experiments are designed and the effectiveness of the two-stage disambiguation strategy is verified. The performance of the two-stage method in this paper is better than that of the traditional method by 0.92% and 5.00%. Secondly, this paper implements a man-machine mutual aid system, which helps to establish recognition rules and a variety of knowledge dictionary resources, and uses these resources and rules to establish an institution name recognition system. The system is compared with two other named entity recognition tools, ISLEX and LTP. It is proved that the rule method has higher performance and efficiency in the task of name disambiguation, and it can be effectively applied to the practical application of the disambiguation system. Thirdly, this paper annotates the whole news corpus of Sogou, obtains the real network corpus resources that can be used in the research of Internet name disambiguation, analyzes the importance of the character attribute to the Internet corpus and the characteristics of each attribute. In view of the unstructured information on the network, the character attribute extraction system is designed and implemented. Finally, the validity of the character attribute feature is verified by the experiment on the real network corpus. Fourth, this paper analyzes the tasks and functions of the disambiguation system, designs and implements the disambiguation module based on knowledge resource, and completes the module of page crawling page analysis based on the data storage of human name disambiguation based on knowledge resources. An intuitive sorting algorithm of disambiguation results is implemented, and a news retrieval result disambiguation system is established.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3
【参考文献】
相关期刊论文 前5条
1 郎君;秦兵;宋巍;刘龙;刘挺;李生;;基于社会网络的人名检索结果重名消解[J];计算机学报;2009年07期
2 杨欣欣;李培峰;朱巧明;王英帅;;一种基于改进的K-means算法的人名消歧系统的设计与实现[J];计算机与数字工程;2010年08期
3 王宁,葛瑞芳,苑春法,黄锦辉,李文捷;中文金融新闻中公司名的识别[J];中文信息学报;2002年02期
4 沈嘉懿;李芳;徐飞玉;Hans Uszkoreit;;中文组织机构名称与简称的识别[J];中文信息学报;2007年06期
5 张小衡,王玲玲;中文机构名称的识别与分析[J];中文信息学报;1997年04期
本文编号:1680297
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1680297.html