基于语言网络的微博特征发现和话题关键词提取
发布时间:2018-06-01 19:54
本文选题:微博 + 复杂网络 ; 参考:《杭州电子科技大学》2014年硕士论文
【摘要】:微博是近年来出现的一种网络新媒体,有着传播迅速,使用方便等优点。随着互联网技术的蓬勃发展,特别是手机互联网用户的迅速增加,每天生成的微博内容越来越多,微博内容的研究也变得日趋重要。本文首先基于海量微博内容语料构建了词同现网络来做微博语体特征发现,然后又针对话题相关微博内容语料构建了话题关键词提取网络,通过对构建的语言网络进行分析和研究,提出了新的微博内容研究和话题关键词提取方法,并获得了满意的实验结果。 首先,本文对语言网络和微博内容研究的现状和发展进行了简要的回顾。文中对语言网络研究的背景知识和相关技术做了分析,接着对微博内容研究的方法进行了总结,主要有两个研究方向,分别是从语言学角度分析微博语体特点和从文本挖掘角度获取微博信息。 其次,本文提出了基于语言网络的微博特征发现方法。语言网络分析方法一般通过对语言形式的定量研究来认识和理解语言网络的共同的拓扑结构和演化的一般规律。本文提出将语言网络分析运用到微博这种网络语言中,通过分析微博内容构建的语言网络的复杂网络特性,来从整体上发现微博内容的语言学特征。 再次,本文在总结了现有的微博关键词提取方法优缺点的基础上,提出了一种基于话题语言网络的关键词提取方法。首先对话题相关的微博内容构建语言网络,然后使用复杂网络中小世界特性中的两种中心性参数-介数中心性、接近中心性和度中心性相结合来作为词语的特征权重,接着计算词语节点特征权重参数值,,最后根据词语节点参数值的大小来选择话题关键词。 最后,使用大规模微博语料和话题相关语料对本文提出的基于语言网络的微博特征发现和话题关键词提取算法进行了实验,并对测试结果进行了分析。实验结果表明,本文的算法对研究微博内容和提取微博话题关键词具有一定的可用性。本文最后对论文所做的工作进行了总结和评述,提炼了微博语言网络和话题关键词提取值得继续研究的若干问题,为以后的研究指明了方向。
[Abstract]:Weibo is a new network media in recent years, which has the advantages of rapid dissemination and convenient use. With the rapid development of Internet technology, especially the rapid increase of mobile Internet users, more and more Weibo content is generated every day, and the research of Weibo content becomes more and more important. In this paper, we first build a word cooccurrence network based on massive Weibo content corpus to do Weibo stylistic feature discovery, and then construct a topic keyword extraction network for topic related Weibo content corpus. Based on the analysis and research of the language network, a new method of Weibo content research and topic keyword extraction is proposed, and satisfactory experimental results are obtained. Firstly, this paper briefly reviews the current situation and development of language network and Weibo content research. In this paper, the background knowledge and related technologies of language network research are analyzed, and then the methods of Weibo content research are summarized, there are two main research directions. It analyzes the features of Weibo style from the linguistic point of view and obtains Weibo information from the angle of text mining. Secondly, this paper proposes a Weibo feature discovery method based on language network. Language network analysis methods generally understand and understand the common topological structure of language network and the general law of evolution through the quantitative study of language forms. In this paper, language network analysis is applied to Weibo, which is a network language. By analyzing the complex network characteristics of language network constructed by Weibo content, the linguistic features of Weibo content can be found as a whole. Thirdly, on the basis of summarizing the advantages and disadvantages of existing Weibo keyword extraction methods, this paper proposes a keyword extraction method based on topic language network. Firstly, the language network is constructed for the topic related Weibo content, and then two kinds of central parameters in the small world characteristic of the complex network are used as the feature weight of the word, which is the combination of the centricity of the medium, the close centrality and the degree centrality. Then the feature weight parameter value of the word node is calculated and the topic key words are selected according to the size of the word node parameter value. Finally, we use large-scale Weibo corpus and topic related corpus to test the algorithm of Weibo feature discovery and topic keyword extraction based on language network, and analyze the test results. Experimental results show that the proposed algorithm is useful for studying Weibo content and extracting Weibo topic keywords. In the end, this paper summarizes and comments the work done in this paper, abstracts some problems worth further study on Weibo language network and topic keyword extraction, and points out the direction for future research.
【学位授予单位】:杭州电子科技大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1
【参考文献】
相关期刊论文 前10条
1 王建伟;荣莉莉;;基于复杂网络理论的中文字字网络的实证研究[J];大连海事大学学报;2008年04期
2 唐璐;张永光;付雪;;语义网络的结构:我们怎样学习语义知识(英文)[J];Journal of Southeast University(English Edition);2006年03期
3 刘知远;郑亚斌;孙茂松;;汉语依存句法网络的复杂网络性质[J];复杂系统与复杂性科学;2008年02期
4 韦洛霞;李勇;康世勇;罗诗裕;;汉语词组网的组织结构与无标度特性[J];科学通报;2005年15期
5 刘海涛;;语言复杂网络的聚类研究[J];科学通报;2010年Z2期
6 陈芯莹;刘海涛;;汉语句法网络的中心节点研究[J];科学通报;2011年10期
7 刘知远;孙茂松;;汉语词同现网络的小世界效应和无标度特性[J];中文信息学报;2007年06期
8 彭泽映;俞晓明;许洪波;刘春阳;;大规模短文本的不完全聚类[J];中文信息学报;2011年01期
9 杨钤雯;寇纪淞;陈富赞;李敏强;;基于本体的语义网络会话聚类和可视化方法[J];模式识别与人工智能;2011年01期
10 姜珍婷;周凯;;从微博看现代汉语新变化[J];江西科技师范学院学报;2010年04期
本文编号:1965429
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1965429.html