当前位置:主页 > 管理论文 > 移动网络论文 >

基于微博中的人物图谱的构建方法研究

发布时间:2019-06-12 03:17
【摘要】:随着互联网的快速发展,参与到互联网中的用户越来越多。在互联网上每天都会产生大量的数据,这些数据包含着很多有用信息。如何从这些无结构的文本数据中抽取有用的结构化数据是本文研究的重点。然而,在这些自然语言文档中,描述了大量的人物社会关系。从这些文档中自动提取人物社会关系对人物社会关系分析研究是十分有用的。自举的关系提取系统能有效的适用于微博环境,本文在该模型的基础上提出了四点改进意见。下面给出了本文研究的主要内容。本文提出了基于图的排列算法。自举关系提取模型能提取出特定关系下的人物实体对。为了提高该模型的性能,本文提出基于图的排列算法,对该模型产生的结果,该算法考虑到结果与种子集的相似性,从而提高了该模型的性能。本文提出了目标关系下的种子集构建模型。由于在关系提取中传统的种子集构建方法需要大量的人工干预,导致实验的效率变低。本文提出的种子集构建方法是利用百度百科构建中文语义知识库,然后对中文语义知识库中的关系进行分类,本文只考虑三种类别的关系提取问题,最后利用中文知识库结合搜索引擎构建种子集。本文改进实体对相似性计算方法。在基于图的排列算法中,需要构建实体对图,实体对之间的相似性计算方法是十分重要的。本文改进了原来实体对图中两个实体对之间的相似性计算方法,本文利用潜在关系分析(LRA)来计算相似性,这种方法可以解决降维去噪问题,能提高计算的准确性。本文改进内容模式相似性计算方法。在基于图的排列算法中,需要构建内容模式图,内容模式之间的相似性计算方法是十分重要的。本文也改进了原来内容模式图中内容模式之间的相似度计算方法,本文采用路径包含树表示内容模式,利用卷积树核函数计算内容模式之间的相似性,这种改进方法可以提高相似性的准确性。本文最后构建出了可视化人物关系图谱,实验证明了本文研究内容的适用性和可行性,本文提出的方法可以用于任何类型的关系提取,具有较强的可扩展性。
[Abstract]:With the rapid development of the Internet, more and more users participate in the Internet. A large amount of data is generated every day on the Internet, which contains a lot of useful information. How to extract useful structured data from these unstructured text data is the focus of this paper. However, in these natural language documents, a large number of character social relations are described. Automatic extraction of character social relations from these documents is very useful for the analysis and research of character social relations. Bootstrap relational extraction system can be effectively applied to Weibo environment. This paper puts forward four suggestions for improvement on the basis of this model. The main contents of this paper are given below. In this paper, an arrangement algorithm based on graph is proposed. The bootstrap relationship extraction model can extract the character entity pairs under the specific relationship. In order to improve the performance of the model, a graph-based permutation algorithm is proposed in this paper. For the results of the model, the algorithm takes into account the similarity between the results and the species subset, thus improving the performance of the model. In this paper, a seed set construction model based on target relation is proposed. Because the traditional seed set construction method needs a lot of artificial intervention in relational extraction, the efficiency of the experiment becomes lower. The seed set construction method proposed in this paper is to use Baidu encyclopedia to construct Chinese semantic knowledge base, and then to classify the relationships in Chinese semantic knowledge base. This paper only considers the relationship extraction problem of three categories, and finally uses the Chinese knowledge base combined with search engine to construct seed set. In this paper, the entity pair similarity calculation method is improved. In the graph-based arrangement algorithm, it is very important to construct the entity pair graph and the similarity calculation method between the entity pair. In this paper, the similarity calculation method between the two entity pairs in the original entity pair is improved. In this paper, the potential relation analysis (LRA) is used to calculate the similarity. This method can solve the problem of dimension reduction and denoising, and can improve the accuracy of the calculation. In this paper, the similarity calculation method of content pattern is improved. In the graph-based arrangement algorithm, it is necessary to construct the content pattern diagram, and the similarity calculation method between the content patterns is very important. This paper also improves the similarity calculation method between the content patterns in the original content pattern diagram. In this paper, the path inclusion tree is used to represent the content pattern, and the convolution tree kernel function is used to calculate the similarity between the content patterns. This improved method can improve the accuracy of similarity. At the end of this paper, the visual relationship map of characters is constructed, and the experiment proves the applicability and feasibility of the research content in this paper. The method proposed in this paper can be used in any type of relational extraction, and has strong scalability.
【学位授予单位】:西华大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1;TP393.092

【参考文献】

相关期刊论文 前10条

1 黄卫春;范少帅;熊李艳;钟茂生;;基于特征选择的人物关系抽取方法[J];科学技术与工程;2015年03期

2 武金刚;;知识图谱——搜索引擎的进化[J];百科知识;2013年22期

3 王连喜;;微博短文本预处理及学习研究综述[J];图书情报工作;2013年11期

4 李卓君;;搜索引擎问题分析及发展趋势研究[J];中国市场;2011年49期

5 张小娣;宋余庆;;基于科学知识图谱的搜索引擎前沿分析[J];科技管理研究;2011年18期

6 邱均平;胡文君;罗力;;基于知识图谱的国际网络搜索引擎研究现状与前沿分析[J];图书情报工作;2010年24期

7 唐明伟;卞艺杰;陶飞飞;;基于语义向量空间模型的文档检索系统研究[J];情报杂志;2010年05期

8 黄鑫;朱巧明;钱龙华;刘梅梅;;基于特征组合的中文实体关系抽取[J];微电子学与计算机;2010年04期

9 庄成龙;钱龙华;周国栋;;基于树核函数的实体语义关系抽取方法研究[J];中文信息学报;2009年01期

10 车万翔,刘挺,李生;实体关系自动抽取[J];中文信息学报;2005年02期

相关硕士学位论文 前2条

1 杜振雷;面向微博短文本的情感分析研究[D];北京信息科技大学;2013年

2 牛鸽军;新浪微博虚拟社区的网络结构研究[D];哈尔滨工业大学;2013年



本文编号:2497686

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2497686.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户48d68***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com