新闻热点话题发现及演化分析研究与应用
发布时间:2018-03-14 02:38
本文选题:LDA模型 切入点:热点话题发现 出处:《南京理工大学》2017年硕士论文 论文类型:学位论文
【摘要】:热点话题是因网络报道而引起人们广泛关注的话题,热点话题发现与演化研究有利于社会大众知晓当前舆论焦点和政府进行良性舆论引导,能够防止有心之徒利用网络的便捷性、不可控性牟取不正当利益,制造社会矛盾。本文主要就新闻热点话题发现及对热点话题演化偏移过程进行研究,主要包括以下几个方面:1、引入了 LDA主题模型,对新闻报道采用基于TF-IDF的词-权值模型和基于语义理解的LDA模型两种文本向量建模方式。在此基础上,针对传统单核心话题描述模型对多核话题描述欠缺的问题,提出了一种多核心话题描述模型,能够识别同一话题下不同的关注核心,并给出了模型构造方法:采用划分聚类与层次聚类结合的方法对新闻报道进行精确聚类。实验表明,多种文本向量建模相结合的方式以及多核心话题描述模型能够提高新闻话题的聚类效果。2、根据热点话题特征分析的结果,将新闻的热度量化为媒体报道热度和网民关注热度,并采用基于两者的复合关注度描述热点话题的热度;同时引入"话题指数",采用基于时间窗口的分段话题聚类方法对热点话题生命周期演化过程进行分析,提出了一种基于多核心话题描述模型的话题演化偏移分析方法,将演化过程看成话题内核心事件的转移过程。实验表明该方法能很好的发现热点话题的演化偏移过程。3、基于上述研究成果,设计并实现了新闻热点话题发现及演化分析子系统,该子系统是移动新闻监测和分析平台的一个重要功能模块,集成了新闻报道预处理、热点话题发现、热点话题演化分析等功能,能够实时发现当前热点话题并展示给用户。
[Abstract]:Hot topic is the topic that people pay much attention to because of network report. The research of hot topic discovery and evolution is helpful for the public to know the current public opinion focus and the government to guide public opinion. It can prevent those who want to make use of the convenience of the network, can not be controlled to obtain improper interests, and create social contradictions. This paper mainly focuses on the discovery of hot topics in news and the process of migration of the evolution of hot topics. It mainly includes the following several aspects: 1, introduces the LDA topic model, adopts two text vector modeling methods for news reports: word-weight model based on TF-IDF and LDA model based on semantic understanding. Aiming at the lack of multi-core topic description model in traditional single-core topic description model, a multi-core topic description model is proposed, which can identify different cores of concern under the same topic. The method of model construction is given. The method of combining partitioning clustering with hierarchical clustering is used to accurately cluster news reports. The combination of multiple text vector modeling and multi-core topic description model can improve the clustering effect of news topics. According to the results of feature analysis of hot topics, the heat of news can be quantified as the heat of media reports and the attention of Internet users. The heat of the hot topic is described by using the composite concern degree based on both, and the topic index is introduced to analyze the evolution process of the life cycle of the hot topic by using the segmented topic clustering method based on the time window. A topic evolution migration analysis method based on multi-core topic description model is proposed. The evolution process is regarded as the transition process of the core events in the topic. The experiment shows that the method can find the evolution migration process of the hot topic very well. Based on the above research results, the subsystem of news hot topic discovery and evolution analysis is designed and implemented. This subsystem is an important function module of mobile news monitoring and analysis platform. It integrates the functions of news report preprocessing, hot topic discovery, hot topic evolution analysis and so on. It can discover the current hot topic in real time and display it to the user.
【学位授予单位】:南京理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 江华丽;;中文分词算法研究与分析[J];物联网技术;2016年01期
2 李凤岭;朱保平;;基于LDA模型的微博话题发现技术研究[J];计算机应用与软件;2014年10期
3 邹晓辉;孙静;;LDA主题模型[J];智能计算机与应用;2014年05期
4 李爱华;尹斐斐;;网格聚类算法研究[J];科技致富向导;2012年23期
5 张小明;李舟军;巢文涵;;基于增量型聚类的自动话题检测研究[J];软件学报;2012年06期
6 彭菲菲;钱旭;;基于用户关注度的个性化新闻推荐系统[J];计算机应用研究;2012年03期
7 徐戈;王厚峰;;自然语言处理中主题模型的发展[J];计算机学报;2011年08期
8 姚全珠;宋志理;彭程;;基于LDA模型的文本分类研究[J];计算机工程与应用;2011年13期
9 姚宗静;余强;;Dirichlet分布概率密度的导出及若干性质[J];科技信息;2010年11期
10 黄晓斌;赵超;;文本挖掘在网络舆情信息分析中的应用[J];情报科学;2009年01期
,本文编号:1609233
本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/1609233.html