基于用户权威度的中文微博话题检测研究
发布时间:2018-02-16 07:17
本文关键词: 微博 数据获取 话题检测 时间相似度 用户权威度 出处:《昆明理工大学》2014年硕士论文 论文类型:学位论文
【摘要】:随着互联网技术的迅猛发展,网络媒体已经成为人们日常生活中不可或缺的一种传播媒介。其中,微博作为新兴典型代表之一,以极快的速度影响着社会传播格局。用户可以通过微博随时随地的更新个人状态和参与讨论自己喜爱或关心的话题,使得微博成为社会热点话题产生的聚集地。对微博进行话题检测,不仅能向用户提供热点话题信息,还能为政府部门进行突发事件监测、舆情分析等方面提供强有力的数据支持,因此研究如何从海量的微博信息中检测出热门话题具有十分重要的现实意义。 微博文本相对于传统文本来说差异较大,除了大量的省略、指代及主观性的个性化语言之外,还有文本较短,话题离散性、实时性和互动性等特点。因而,传统的文本话题检测方法不能直接应用于微博,故本文在研究过程中结合微博自身的特性,提出了一种基于用户权威度的中文微博话题检测方法。 首先,在本文的话题检测算法中引入用户威权度,把用户的粉丝数量作为微博影响力的重要参考因素,与以往只针对无结构文本分析的话题检测算法相比,提高了话题检测的准确度;其次,一般的话题检测方法不重视时间因素,而本文把时间因素作为检测话题的重要参数,把微博时间限制在一个有效期内,使得话题发现的更加精确;最后,传统方法中话题检测与话题排序是相互独立的,不能直接利用检测的结果对话题排序,而本文在对话题进行检测过程中,引入了基础能量和相关性能量的概念并以此作为话题能量,在话题检测完成后可以依据话题能量大小对话题直接进行排序。
[Abstract]:With the rapid development of Internet technology, Internet media has become an indispensable media in people's daily life. At an extremely fast speed, it affects the social communication pattern. Users can update their personal status and participate in discussions on topics they like or care about through Weibo at any time and anywhere. Weibo has become a gathering place for hot social topics. The topic detection of Weibo can not only provide users with information on hot topics, but also monitor emergencies for government departments. The analysis of public opinion provides strong data support, so it is of great practical significance to study how to detect hot topics from mass Weibo information. Weibo's text is quite different from the traditional text. In addition to a large number of individualized languages such as ellipsis, reference and subjectivity, there are also features such as shorter text, discrete topic, real-time and interactive, etc. The traditional text topic detection method can not be directly applied to Weibo, so this paper puts forward a new topic detection method based on user authority, which is based on the characteristics of Weibo itself. First of all, the user authority is introduced into the topic detection algorithm in this paper, and the number of users' fans is regarded as an important reference factor of Weibo's influence, compared with the previous topic detection algorithm, which only focuses on unstructured text analysis. The accuracy of topic detection is improved. Secondly, the general method of topic detection does not attach importance to time factor, and this paper regards time factor as an important parameter of detecting topic, and limits Weibo's time to a period of validity. Finally, the traditional methods of topic detection and topic ranking are independent of each other, can not directly use the results of the detection of topics ranking, and this paper in the process of topic detection, The concepts of basic energy and correlation energy are introduced and used as topic energy. After topic detection is completed, topics can be sorted directly according to the magnitude of topic energy.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【参考文献】
相关期刊论文 前7条
1 赵华;赵铁军;张姝;王浩畅;;基于内容分析的话题检测研究[J];哈尔滨工业大学学报;2006年10期
2 万小军,杨建武;在线新闻主题检测系统的设计与应用[J];华南理工大学学报(自然科学版);2004年S1期
3 李昕;朱永盛;武港山;;论坛消息的语义漂移分析[J];计算机工程;2006年04期
4 宋延涛;李大旭;;浅析当前微博传播的特征、弊端及治理[J];科技信息;2010年30期
5 曹鹏;李静远;满彤;刘悦;程学旗;;Twitter中近似重复消息的判定方法研究[J];中文信息学报;2011年01期
6 文坤梅;徐帅;李瑞轩;辜希武;李玉华;;微博及中文微博信息处理研究综述[J];中文信息学报;2012年06期
7 马彬;洪宇;陆剑江;姚建民;朱巧明;;基于线索树双层聚类的微博话题检测[J];中文信息学报;2012年06期
,本文编号:1514965
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1514965.html