网络热点话题趋势分析及预测研究
发布时间:2019-03-08 17:19
【摘要】:近年来,针对社会媒体信息的自然语言处理相关研究越来越受到广泛关注,特别是对社会突发事件及网络敏感信息的监控与预警,对社会舆论的情感趋势变化的分析与预测,都有非常重要的研究价值。本文面向新浪微博数据,对热点话题的情感趋势进行深入的分析和计算,并根据历史微博数据进行趋势建模,预测热点话题的未来趋势。本文根据微博数据的特点,将热点话题分为长期话题和短期话题,对这两种不同的话题分别进行事件趋势分析与预测,重点对预测趋势发展的各种特征进行深入研究。本文的主要研究工作如下:1.提出了一种基于联合深度学习模型的情感分类方法,对微博数据进行情感分类。该方法使用卷积操作将纯粹的多个词向量序列重新序列化,得到了具有n-gram信息的词向量,实验结果表明,采用该方法与传统的CNN方法和LSTM方法比较情感分类准确率更高,该方法在COAE2016年的情感分类任务中排名第一。2.对微博短期热点话题进行趋势分析和趋势预测,该方法通过对样本范围内数据的计算,获得影响事件趋势的相关指标的数据值,将2个小时划分成1个时间段,使用不同的历史时间段数据进行对比,在4个时间段内达到预测的最佳效果。在事件趋势预测研究上按照特征类别排序,构建回归模型进行话题热度预测。实验对比了自回归方法,GBDT和CNN四种预测方法,实验结果表明在短期话题中预测2个小时内的趋势时,基于GBDT的方法达到最佳效果,当预测误差在5%以内记为预测准确时,准确率达79.1%。3.对于长期话题,本文提出子主题分离预测法,利用在线LDA模型对相同时间片上的微博数据进行训练,得到子主题演化和子主题强度,将话题的发展分为4类,使用SVM建立分类模型,对于不同波峰之间的数据分别进行预测,实验结果表明该方法对于话题热度的分类准确率达到86%,整体趋势预测也取得了较好的结果。
[Abstract]:In recent years, more and more attention has been paid to the research on natural language processing of social media information, especially the monitoring and early warning of social emergencies and network sensitive information, and the analysis and prediction of the emotional trend of social public opinion. All have very important research value. Based on the data of Sina Weibo, this paper analyzes and calculates the emotional trend of hot topics, and models the trend of hot topics according to the historical Weibo data to predict the future trends of hot topics. According to the characteristics of Weibo's data, this paper divides the hot topics into long-term topics and short-term topics, and analyzes and forecasts the event trends of these two different topics respectively, focusing on the in-depth study of the various characteristics of the forecast trend development. The main research work of this paper is as follows: 1. In this paper, an emotion classification method based on joint deep learning model is proposed to classify the emotion of Weibo data. The convolutional operation is used to re-serialize the sequence of pure multiple word vectors, and the word vectors with n-gram information are obtained. The experimental results show that the proposed method is more accurate than the traditional CNN and LSTM methods in emotional classification. This method ranks first among the emotion classification tasks of COAE 2016. 2. Based on the trend analysis and trend prediction of Weibo's short-term hot topics, this method obtains the data values of the related indicators that affect the trend of events by calculating the data in the sample range, and divides the two hours into a period of time. The data of different historical time periods are compared to achieve the best prediction results in four time periods. In the research of event trend prediction, a regression model is constructed to predict the topic heat according to the order of feature categories. The experiment compares auto-regression method, GBDT and CNN prediction method. The experimental results show that the GBDT-based method achieves the best result when predicting the trend within 2 hours in short-term topic. When the prediction error is 5%, the prediction is accurate. The accuracy rate is 79.1%. 3. For a long-term topic, this paper proposes a method of sub-topic separation and prediction. Using the online LDA model to train Weibo data on the same time slice, we get the sub-topic evolution and sub-theme intensity, and divide the topic development into four categories. A classification model based on SVM is used to predict the data between different peaks. The experimental results show that the classification accuracy of the method for topic heat is 86%, and the overall trend prediction has achieved good results.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
本文编号:2437029
[Abstract]:In recent years, more and more attention has been paid to the research on natural language processing of social media information, especially the monitoring and early warning of social emergencies and network sensitive information, and the analysis and prediction of the emotional trend of social public opinion. All have very important research value. Based on the data of Sina Weibo, this paper analyzes and calculates the emotional trend of hot topics, and models the trend of hot topics according to the historical Weibo data to predict the future trends of hot topics. According to the characteristics of Weibo's data, this paper divides the hot topics into long-term topics and short-term topics, and analyzes and forecasts the event trends of these two different topics respectively, focusing on the in-depth study of the various characteristics of the forecast trend development. The main research work of this paper is as follows: 1. In this paper, an emotion classification method based on joint deep learning model is proposed to classify the emotion of Weibo data. The convolutional operation is used to re-serialize the sequence of pure multiple word vectors, and the word vectors with n-gram information are obtained. The experimental results show that the proposed method is more accurate than the traditional CNN and LSTM methods in emotional classification. This method ranks first among the emotion classification tasks of COAE 2016. 2. Based on the trend analysis and trend prediction of Weibo's short-term hot topics, this method obtains the data values of the related indicators that affect the trend of events by calculating the data in the sample range, and divides the two hours into a period of time. The data of different historical time periods are compared to achieve the best prediction results in four time periods. In the research of event trend prediction, a regression model is constructed to predict the topic heat according to the order of feature categories. The experiment compares auto-regression method, GBDT and CNN prediction method. The experimental results show that the GBDT-based method achieves the best result when predicting the trend within 2 hours in short-term topic. When the prediction error is 5%, the prediction is accurate. The accuracy rate is 79.1%. 3. For a long-term topic, this paper proposes a method of sub-topic separation and prediction. Using the online LDA model to train Weibo data on the same time slice, we get the sub-topic evolution and sub-theme intensity, and divide the topic development into four categories. A classification model based on SVM is used to predict the data between different peaks. The experimental results show that the classification accuracy of the method for topic heat is 86%, and the overall trend prediction has achieved good results.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【参考文献】
相关期刊论文 前3条
1 李栋;徐志明;李生;刘挺;王秀文;;在线社会网络中信息扩散[J];计算机学报;2014年01期
2 谢丽星;周明;孙茂松;;基于层次结构的多策略中文微博情感分析和特征抽取[J];中文信息学报;2012年01期
3 徐军;丁宇新;王晓龙;;使用机器学习方法进行新闻的情感自动分类[J];中文信息学报;2007年06期
相关博士学位论文 前1条
1 田野;基于微博平台的事件趋势分析及预测研究[D];武汉大学;2012年
相关硕士学位论文 前3条
1 张华;基于优化BP神经网络的微博舆情预测模型研究[D];华中师范大学;2014年
2 王来涛;网络短文本话题发现与趋势预测研究[D];北京工业大学;2013年
3 刘丽芳;微博客的传播特征与传播效果研究[D];浙江大学;2010年
,本文编号:2437029
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2437029.html