基于互联网文本情感分析的数值序列预测算法研究
发布时间:2018-08-22 18:23
【摘要】:在信息时代,互联网已经成为人们最为主要的交流沟通工具,尤其是“互联网+”更是随时随刻都在改变着人们的生活方式。同时,随着4G网络逐步成熟,移动设备用户也是日益增长,更多的网络用户乐意通过各式各样的媒体渠道交互信息,表达自己对商品、社会事件以及服务等的意见和情感。由于网络传播范围广、速度快及用户多,必然使得数据呈现爆炸式的增长。经过长时间的发展和积累逐渐形成了社会的集体智慧,因而通过对互联网大数据的挖掘,分析网络用户的情感状态以及社交媒体表达情感导向,对许多社会活动具有预测能力。目前,在基于情感分析预测算法研究中,还有很多难题需要解决,如互联网信息的采集、文本可信性分析,预测模型变量的选取及预测敏感度等。本文针对这些问题开发了数据采集、文本分类接口,提出了基于可信事件信息情感倾向的单变量和多变量预测模型,对商品价格以及房产股市进行预测。本文提供了从搜索引擎中新闻数据获取的通用采集接口、基于Scrapy框架的价格数据采集器和文本可信分类模型,解决在不同领域中采集文本数据的通用性以及动态网页页面信息的采集和文本的可信性等问题。研究者只需要针对自己研究的领域按照接口文档的要求提供关键词和价格数据采集Xpath路径就可以方便的采集文本数据和相关的价格数据。将获取到的网络文本数据通过可信分类处理后可以计算得到情感倾向因素,并用以预测算法的研究中。在文本分析基础上,本文提出了基于可信事件的情感倾向的单变量和多变量的价格预测算法。由于在时间序列算法中,数据样本需要具有稳定性和非趋势性,因此对数据进行拟合检测并且采用差分法对样本数据进行趋势性和平稳性处理。在单变量预测模型中,结合可信文本情感倾向因素提出SSA-ARMA模型,经过训练得到模型的回归和移动的最佳周期数。通过多组实验比较得到新模型的误差变小了,预测效果明显得到提升。为了进一步解释文本情感倾向的影响程度,提出的MSA-VAR多变量预测模型分析房产股市中多个变量的脉冲响应和波动显示情感因素对收盘价具有明显作用,实验表明MSA-VAR模型具有较好的预测效果和鲁棒性。最后,本文应用算法研究结果,实现了移动设备价格预测应用软件,具有较高的实用价值。
[Abstract]:In the information age, the Internet has become the most important communication tool, especially the Internet is changing people's way of life at any time. At the same time, with the gradual maturity of 4G network, mobile device users are also growing. More network users are willing to exchange information through various media channels to express their opinions and feelings on goods, social events and services. Because of the wide range of network transmission, fast and many users, the data will be explosive growth. After a long period of development and accumulation, the collective wisdom of the society has been gradually formed. Therefore, through the mining of the Internet big data, the analysis of the emotional state of network users and the expression of emotional orientation by social media, it has the ability to predict many social activities. At present, there are still many problems to be solved in the research of prediction algorithm based on emotion analysis, such as the collection of Internet information, the analysis of text credibility, the selection of prediction model variables and the prediction sensitivity and so on. In this paper, the interface of data collection and text classification is developed, and a single variable and multivariable prediction model based on emotional tendency of trusted event information is proposed to predict commodity price and real estate stock market. This paper provides a general collection interface for news data acquisition from search engines, a price data collector based on Scrapy framework and a text trusted classification model. It solves the problems of the universality of collecting text data in different fields, the collection of dynamic web page information and the credibility of text. Researchers only need to provide keyword and price data to collect Xpath path according to the requirements of interface documents in order to collect text data and related price data conveniently. The obtained network text data can be calculated by trusted classification and the affective tendency factors can be calculated and used in the research of prediction algorithm. On the basis of text analysis, this paper proposes a single-variable and multi-variable price prediction algorithm based on the emotional tendency of trusted events. In the time series algorithm, the data samples need to be stable and non-trend, so the fitting and detection of the data is carried out and the difference method is used to deal with the trend and stability of the sample data. In the univariate prediction model, the SSA-ARMA model is proposed by combining the affective tendency factors of the trusted text, and the best number of cycles for regression and movement of the model is obtained by training. The error of the new model is reduced and the prediction effect is improved obviously. In order to further explain the influence of the emotional tendency of the text, the MSA-VAR multivariable prediction model is proposed to analyze the impulse response and volatility of multiple variables in the real estate stock market, which shows that the emotional factors play a significant role in the closing price. Experiments show that the MSA-VAR model has good prediction effect and robustness. Finally, the application software of mobile device price prediction is realized by using the algorithm, which has high practical value.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
本文编号:2197930
[Abstract]:In the information age, the Internet has become the most important communication tool, especially the Internet is changing people's way of life at any time. At the same time, with the gradual maturity of 4G network, mobile device users are also growing. More network users are willing to exchange information through various media channels to express their opinions and feelings on goods, social events and services. Because of the wide range of network transmission, fast and many users, the data will be explosive growth. After a long period of development and accumulation, the collective wisdom of the society has been gradually formed. Therefore, through the mining of the Internet big data, the analysis of the emotional state of network users and the expression of emotional orientation by social media, it has the ability to predict many social activities. At present, there are still many problems to be solved in the research of prediction algorithm based on emotion analysis, such as the collection of Internet information, the analysis of text credibility, the selection of prediction model variables and the prediction sensitivity and so on. In this paper, the interface of data collection and text classification is developed, and a single variable and multivariable prediction model based on emotional tendency of trusted event information is proposed to predict commodity price and real estate stock market. This paper provides a general collection interface for news data acquisition from search engines, a price data collector based on Scrapy framework and a text trusted classification model. It solves the problems of the universality of collecting text data in different fields, the collection of dynamic web page information and the credibility of text. Researchers only need to provide keyword and price data to collect Xpath path according to the requirements of interface documents in order to collect text data and related price data conveniently. The obtained network text data can be calculated by trusted classification and the affective tendency factors can be calculated and used in the research of prediction algorithm. On the basis of text analysis, this paper proposes a single-variable and multi-variable price prediction algorithm based on the emotional tendency of trusted events. In the time series algorithm, the data samples need to be stable and non-trend, so the fitting and detection of the data is carried out and the difference method is used to deal with the trend and stability of the sample data. In the univariate prediction model, the SSA-ARMA model is proposed by combining the affective tendency factors of the trusted text, and the best number of cycles for regression and movement of the model is obtained by training. The error of the new model is reduced and the prediction effect is improved obviously. In order to further explain the influence of the emotional tendency of the text, the MSA-VAR multivariable prediction model is proposed to analyze the impulse response and volatility of multiple variables in the real estate stock market, which shows that the emotional factors play a significant role in the closing price. Experiments show that the MSA-VAR model has good prediction effect and robustness. Finally, the application software of mobile device price prediction is realized by using the algorithm, which has high practical value.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【参考文献】
相关期刊论文 前8条
1 赵丽;;工信部:中国4G基站规模超200万个4G用户数突破5亿[J];邮电设计技术;2016年06期
2 徐健;;基于网络用户情感分析的预测方法研究[J];中国图书馆学报;2013年03期
3 李正茂;;李正茂:2020年互联网数据量将是目前的44倍[J];信息系统工程;2011年06期
4 赵妍妍;秦兵;刘挺;;文本情感分析[J];软件学报;2010年08期
5 张紫琼;叶强;李一军;;互联网商品评论情感分析研究综述[J];管理科学学报;2010年06期
6 况夯;罗军;;基于遗传FCM算法的文本聚类[J];计算机应用;2009年02期
7 邓琦;苏一丹;曹波;闭剑婷;;中文文本体裁分类中特征选择的研究[J];计算机工程;2008年23期
8 许高建;胡学钢;路遥;涂立静;;一种改进的文本特征选择方法的研究与设计[J];微型电脑应用;2008年05期
,本文编号:2197930
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2197930.html