基于中文微博的热门话题提取与追踪
[Abstract]:Since its launch, Weibo has changed the way people get news and get to know current events with its wide participation. In recent years, many breaking news and hot topics have been released through the microblogging platform, and its speed and scope of transmission are incomparable with traditional media. At present, only Sina Weibo is daily. The number of Posts has reached hundreds of millions. These huge amounts of data cover all aspects of people's lives and contain a lot of valuable topic information. If we can extract these hot topics correctly, it is of great significance for us to understand the latest hot topics of public opinion and grasp the trend of public opinion. However, in the face of this magnitude of data, we only rely on people. It is far from enough to process microblog posts. In addition, microblog posts are short texts and have very serious data sparsity. Some traditional topic extraction and tracking algorithms can not be directly used for processing. An improved topic extraction model MF-LDA (Microblog Features Latent Dirichlet Allocation) is proposed to extract hot topics from microblogs. This model improves the traditional LDA (Latent Dirichlet Allocation) model by combining the five unique features of microblogs: praise, comment, forwarding, posting time and user authority. Among them, praise number, forwarding number and comment number are used to calculate the attention of micro-blog, user authority is used to calculate the authoritative value of micro-blog, and then divide the micro-blog into corresponding time windows according to the posting time, and then count the word frequency of micro-blog posts in each time window. The higher the probability, the more likely the word is to be a hot topic. 2. This paper traces the hot topic mainly from the structure and content aspects. To track the topic structure, this paper firstly constructs the Hot Topic Life Cycle Model (HTLCM), and divides the topic life cycle into five stages: birth, growth, maturity, decline and disappearance, by calculating the number of topics in a unit time, the growth rate. This paper integrates MF-LDA model with HTLCM model and proposes an HTT (Hot Topic Tracking Algorithm) algorithm for tracking the topic content. In the time window, the candidate hot topics marked by the HTLCM model are allocated to the corresponding time window according to the publishing time, and then the data of each time window is input into the MF-LDA model, so that the most relevant keywords of the hot topic in each time window can be obtained. By analyzing the changes of the keywords, the key words can be obtained. Finally, in order to verify the validity of the proposed model and algorithm, experiments and analysis are carried out on real data sets. The experimental results show that the Perplexity (perplexity) of MF-LDA model under the same conditions is lower than that of LDA model, but the coverage rate of MF-LDA model is higher than that of LDA model. The algorithm can not only keep track of hot topics, but also find potential hot topics effectively. The experimental results show that the proposed model and method have good effect and practical significance in Hot Topics Extraction and tracking.
【学位授予单位】:西华大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1;TP393.092
【相似文献】
相关期刊论文 前10条
1 林志;浅谈热门话题的采写[J];新闻与写作;1991年12期
2 董惠君;谈热门话题节目[J];视听界;1995年04期
3 N.K-D.;;报界热门话题[J];科技潮;1998年05期
4 杨旭东;热门话题的谈法分析──以“知识经济与高等教育”话题为例[J];现代传播-北京广播学院学报;1999年05期
5 王强;;构建企业完整的知识体系[J];中国计算机用户;2008年Z2期
6 阿昆;;企业重组话档案[之一][J];北京档案;2007年03期
7 金顺荣;谈思辨在热门话题中的运用[J];新闻前哨;1999年02期
8 许浚;公司治理与企业发展[J];通信企业管理;2005年12期
9 ;知识经济——当今热门话题(上)[J];电脑知识;1998年09期
10 张群;承诺什么[J];中国邮政;1997年07期
相关会议论文 前2条
1 胡万地;姚伟;;构建和谐企业之管见[A];落实科学发展观 构建和谐社会——第十一届浙江省经营管理大师风采及浙江省经营管理研究会2005年年会论文汇编[C];2005年
2 刘春林;马英姿;;思维向微观延伸苦练内功工作从基础入手建立现代企业制度[A];现代企业运行机制与思维创新——企业运行机制与思维创新研讨会议论文[C];2003年
相关重要报纸文章 前10条
1 沈莹;“家庭话题研讨”催生文明风尚[N];中国妇女报;2007年
2 本报记者 房琳琳 赵英淑;聚焦2006两会热门话题[N];科技日报;2006年
3 记者 毛丽萍;“全民创业”成武汉市政协全会热门话题[N];人民政协报;2008年
4 记者 莫瑞宁;稳定就业 共同担当责任[N];西安日报;2009年
5 记者 刘云山;消费账单成为热门话题[N];中国邮政报;2005年
6 秦玉龙;3.15 消费维权再度成为热门话题[N];平凉日报;2006年
7 记者 陈枫 雷辉;政府要关心民工的“被窝”[N];南方日报;2010年
8 记者 赵鹏 张建高;热门话题冷静思考[N];新华每日电讯;2002年
9 本报记者 白槐;津津乐道 热门话题[N];中国旅游报;2001年
10 ;IPv6、移动性和SIP成为热门话题[N];人民邮电;2006年
相关硕士学位论文 前10条
1 叶永涛;基于中文微博的热门话题提取与追踪[D];西华大学;2017年
2 张萌;关于新浪微博热门话题的分析研究[D];山东大学;2015年
3 陈静;微博热门话题及其线下行为转化研究[D];华中科技大学;2015年
4 李新娟;微博热门话题意义生成的符号学分析[D];西北师范大学;2012年
5 杨丹丹;论新浪微博热门话题的传播[D];东北师范大学;2012年
6 刘璐;面向微博热门话题的主客观分类方法研究[D];山西大学;2013年
7 张文汐;新浪微博热门话题的特点与规律研究[D];辽宁大学;2014年
8 赵红运;基于用户活跃度和热门话题的微博社区推荐技术研究[D];兰州交通大学;2014年
9 张跃伟;基于微博客话题的热点预测及传播溯源[D];北京邮电大学;2014年
10 王征勇;微博平台的热门话题检测[D];浙江大学;2013年
,本文编号:2178078
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2178078.html