当前位置:主页 > 管理论文 > 移动网络论文 >

基于文本内容的微博突发话题检测技术研究

发布时间:2018-06-23 15:44

  本文选题:微博 + 突发 ; 参考:《杭州电子科技大学》2014年硕士论文


【摘要】:微博的开放性与便捷性,使得微博己经成为了网络舆论传播的一个重要平台。但是微博信息量大,传播速度快,这给网络舆情的收集和管理工作带来了挑战。因此,如何从微博信息流中及时准确地检测出突发话题是当前研究中的一个难点和热点问题。本文对微博突发话题侦测中的两个关键技术:突现主题词和观点词的检测方法展开了研究。其主要工作包括如下三个方面。 首先为了提高侦测话题的准确率和召回率,提出了一种基于内容搜索的突现主题词检测方法。以暴发性关键词为线索,借助Lucene检索工具把与暴发性关键词相关的微博文本合并形成一个文本文档,然后结合传统的TF-IDF方法摘取文档中的主题词。实验表明,当检测到的主题词达到八个甚至十个时,准确率和召回率的权衡值F-measure分别为0.87和0.84,其平均F-measure值比基于关联规则的方法提高了13.2%。 其次,为了更准确地检测出话题中表达的主要观点,提出了一种基于互信息的观点词检测方法。以大连理工大学的情感词典为基础,训练情感词典,用改进的互信息方法计算主题词与情感词之间的关联程度,并以此来找到与主题词最相关的观点词。对比实验表明,以互信息理论为基础来计算主题词与观点词之间的关联程度,可以更准确的检测出话题中表达的主要观点,观点词检测的准确率和召回率分别为0.72和0.65,其综合评估指标F-measure的值为0.68,比传统的方法提高了约5%。 最后在上述提出两种方法的基础之上,实现了一个可在线检测微博突发话题的系统。系统一方面采用了文章中提出的突现主题词检测方法和观点词检测方法,实现了突发话题的检测功能,验证了方法的有效性;另一方面实现了微博内容定位和微博内容搜索功能,使用户能够定位到与突发话题相关的具体微博。 本文以微博文本内容为研究对象,提出了基于内容搜索的突现主题词检测方法和基于互信息的观点词检测方法,并且在这两种方法的基础上实现了一个在线的微博突发话题检测系统。本文的研究成果将有助于舆情监察用户更全面更直观的掌握最新的网络舆情,为微博的舆情监察工作带来了便利。
[Abstract]:With the openness and convenience of Weibo, Weibo has become an important platform for the dissemination of public opinion. However, Weibo has a large amount of information and high speed of dissemination, which brings challenges to the collection and management of network public opinion. Therefore, how to detect burst topic from Weibo information flow in time and accurately is a difficult and hot issue in current research. In this paper, two key technologies in Weibo burst topic detection, namely, the detection method of emergent theme words and opinion words, are studied. Its main work includes the following three aspects. Firstly, in order to improve the accuracy and recall rate of detecting topic, a method based on content search is proposed to detect the pop-up subject words. With the help of the Lucene retrieval tool, the Weibo text related to the fulminant keyword is combined to form a text document, and then the theme words in the document are extracted with the traditional TF-IDF method. The experimental results show that when the detected subject words reach to eight or even ten, the trade-off values of accuracy and recall are 0.87 and 0.84, respectively. The average F-measure value is 13.2g higher than that of the method based on association rules. Secondly, in order to detect the main views expressed in the topic more accurately, a method of viewpoint word detection based on mutual information is proposed. Based on the emotion dictionary of Dalian University of Technology, this paper trains the emotion dictionary, calculates the correlation degree between the subject word and the emotion word by using the improved mutual information method, and finds the most relevant opinion words. The comparative experiments show that, based on the mutual information theory to calculate the correlation between theme words and opinion words, we can more accurately detect the main views expressed in the topic. The accuracy and recall rate of opinion word detection are 0.72 and 0.65, respectively. The F-measure, a comprehensive evaluation index, is 0.68, which is about 5 times higher than the traditional method. Finally, on the basis of the two methods mentioned above, a system for detecting Weibo burst topics on line is implemented. On the one hand, the system adopts the detection method of emergent theme words and viewpoint words, which realizes the detection function of burst topic, and verifies the validity of the method. On the other hand, the functions of Weibo content location and Weibo content search are implemented, which enables users to locate specific Weibo related to burst topics. In this paper, Weibo text content is taken as the research object, and a method of detecting emergent theme words based on content search and a method of detecting viewpoint words based on mutual information are proposed. On the basis of these two methods, an online Weibo burst topic detection system is implemented. The research results of this paper will help the users to master the latest network public opinion more comprehensively and intuitively, and bring convenience to the public opinion monitoring work of Weibo.
【学位授予单位】:杭州电子科技大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1

【参考文献】

相关期刊论文 前10条

1 叶璐;;微博中的负面情绪传播分析[J];今传媒;2012年02期

2 杨德贺;陈宜金;杨溪;吕京国;张子昕;张帅;;面向震害信息提取的多源遥感图像自动配准[J];国土资源遥感;2013年03期

3 周立柱;贺宇凯;王建勇;;情感分析研究综述[J];计算机应用;2008年11期

4 姜胜洪;;微博时代突发事件网络舆情研究[J];理论与现代化;2012年03期

5 杨亮;林原;林鸿飞;;基于情感分布的微博热点事件发现[J];中文信息学报;2012年01期

6 文坤梅;徐帅;李瑞轩;辜希武;李玉华;;微博及中文微博信息处理研究综述[J];中文信息学报;2012年06期

7 赵妍妍;秦兵;刘挺;;文本情感分析[J];软件学报;2010年08期

8 曾润喜;;网络舆情管控工作机制研究[J];图书情报工作;2009年18期

9 张见威;韩国强;沃焱;;基于边界距离场互信息的图像配准方法[J];通信学报;2006年07期

10 王皓;孙宏斌;张伯明;郭庆来;;基于混合互信息的特征选择方法及其在静态电压稳定评估中的应用[J];中国电机工程学报;2006年07期



本文编号:2057630

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2057630.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户70260***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com