一种基于BTM主题模型的命名实体链接方法研究
[Abstract]:With the expansion of network resources, the increasing of information makes it more and more difficult for people to obtain valuable information. However, with the development and popularity of short texts such as Tweets, Weibo, people are unable to get more interesting content from them, and it becomes a key and difficult point to study the ambiguity of named entity items. Named entity linking is an important method to solve this problem. Named entity link is the process of linking a given named entity in a document to an unambiguous entity in the knowledge base, including the merging of synonymous entities, disambiguation of ambiguous entities, and so on. This technology can improve the information filtering ability of online recommendation system, Internet search engine and other practical applications. In this paper, a named entity linking method based on BTM subject model is proposed for short text, which is short in content and random in language. In this paper, we first use offline Wikipedia to construct named entity knowledge base, synonym table and ambiguous lexicon. This paper uses a rule-based and statistical approach to identify named entities in short text. Because of the diversity of named entities in short text, the synonyms in the knowledge base are standardized, the candidate named entity collections are obtained from ambiguous word tables and pruned according to the context characteristics of named entities. Reduce the size of candidate entity set and improve the efficiency of candidate entity sorting. In this paper, the co-occurrence frequency and the single occurrence frequency of words are considered synthetically, and the MPM word co-occurrence measure is improved to calculate the cooccurrence degree coefficient by only considering the co-occurrence frequency and not considering the occurrence frequency of a single word. Secondly, based on the assumption that the words in the same document have similar topic distribution with named entities, this paper models and disambiguates the documents at the semantic level, and proposes a named entity linking method based on BTM topic model. This method uses BTM model based on cooccurrence coefficient to model named entity semantics, and uses Gyibug sampling method to solve parameters, which makes the model more simple and accurate, and provides a theoretical basis for the subsequent data processing. Finally, according to the cosine similarity between the location vector of the named entity and the candidate entity, the named entity in the given text is linked to an unambiguous named entity in the knowledge base.
【学位授予单位】:大连海事大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 向宇;郭云龙;徐潇;曾维刚;李莉;;多策略中文微博实体词消歧及实体链接[J];计算机应用与软件;2016年08期
2 陈玉博;何世柱;刘康;赵军;吕学强;;融合多种特征的实体链接技术研究[J];中文信息学报;2016年04期
3 谭咏梅;王睿;李茂林;;基于上下文信息和排序学习的实体链接方法[J];北京邮电大学学报;2015年05期
4 杨光;刘秉权;刘铭;;基于图方法的命名实体消歧[J];智能计算机与应用;2015年05期
5 王庆;陈泽亚;郭静;陈晰;王晶华;;基于词共现矩阵的项目关键词词库和关键词语义网络[J];计算机应用;2015年06期
6 昝红英;吴泳钢;贾玉祥;牛桂玲;;基于多源知识的中文微博命名实体链接[J];山东大学学报(理学版);2015年07期
7 谭咏梅;杨雪;;结合实体链接与实体聚类的命名实体消歧[J];北京邮电大学学报;2014年05期
8 怀宝兴;宝腾飞;祝恒书;刘淇;;一种基于概率主题模型的命名实体链接方法[J];软件学报;2014年09期
9 魏强;金芝;许焱;;基于概率主题模型的物联网服务发现[J];软件学报;2014年08期
10 肖智博;车丰;吴镝;李庆丰;鲁明羽;;查询无关排序主题模型[J];模式识别与人工智能;2014年07期
相关博士学位论文 前1条
1 郭宇航;基于上下文的实体链指技术研究[D];哈尔滨工业大学;2014年
相关硕士学位论文 前5条
1 王睿;实体链接的研究与实现[D];北京邮电大学;2015年
2 薛昊原;领域文本资源实体链接算法研究[D];郑州大学;2015年
3 郭云龙;微博实体与百科条目链接的多策略研究[D];西南大学;2015年
4 杨雪;基于维基百科的命名实体消歧的研究与实现[D];北京邮电大学;2014年
5 官山山;中文微博实体链接方法研究[D];哈尔滨工业大学;2013年
,本文编号:2398838
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2398838.html