通过新浪微博数据挖掘预测上证综指走向
发布时间:2018-12-15 14:59
【摘要】:社交网络在最近几年发展迅速,国内的新浪微博覆盖面广,其内容产生便捷,传播迅速,提供了海量的直接或间接数据,故本文选取新浪微博作为数据来源,通过抽取新浪微博中的文本数据,结合上证综指的涨跌信息,发掘二者之问的相关性,并尝试建立预测模型,进而为股市投资者提供一定的参考信息。 新浪微博文本数据的抓取,主要是通过自己编写网络爬虫来实现的。其中,重点分析并解决了用户登陆、高级搜索、单位时间内IP访问次数限制、文本析取、文本清洗、指标提取等问题。 将整理后的新浪微博文本信息以及上证综指收盘价信息,结合人工神经网络算法,最终建立了新浪微博对上证综指收盘价的预测模型。 本文主要创新点有: 1.国内利用新浪微博数据预测上证综指走势的研究尚未发现,本文以此为出发点,利用新浪微博数据预测上证综指走势。 2.新浪微博文本内容的抓取过程中,引入分布式系统的机制,解决了新浪微博在用户层次和IP层次上设置的反网络爬虫限制。 3.本研究属于时间序列分析,文中创新的解决了新浪微博的搜索,在指定时间区间并指定微博相关关键词的条件下,成功抓取到微博内容。 4.个性化的改进人工神经网络算法,加入可变数据集和自动修正特征,提高了模型预测精度。
[Abstract]:Social networks have developed rapidly in recent years. Sina Weibo has a wide coverage, its content is easy to produce, it spreads quickly, and provides a large amount of direct or indirect data. By extracting the text data from Weibo of Sina and combining the information of the rise and fall of the Shanghai Composite Index, this paper explores the correlation of the two questions, and tries to establish a forecasting model to provide certain reference information for the stock market investors. Sina Weibo text data capture, mainly through their own web crawler to achieve. Among them, the problems of user login, advanced search, IP access times per unit time, text extraction, text cleaning and index extraction are analyzed and solved. Combining the text information of Sina Weibo and the closing price information of Shanghai Composite Index, and combining the artificial neural network algorithm, the final forecast model of the closing price of Shanghai Composite Index is established by Sina Weibo. The main innovations of this paper are as follows: 1. The research on forecasting the trend of Shanghai Composite Index by Sina Weibo data has not been found in China. This paper takes this as the starting point and forecasts the trend of Shanghai Composite Index by using the data of Sina Weibo. 2. The mechanism of distributed system is introduced in the process of text content capture of Sina Weibo, which solves the anti-network crawler restriction set at user level and IP level. 3. This research belongs to the time series analysis, the article innovatively solves the Sina Weibo's search, under the condition that designates the time interval and designates the Weibo related key words, successfully grabs the Weibo content. 4. The improved artificial neural network algorithm, the variable data set and the automatic correction feature can improve the prediction accuracy of the model.
【学位授予单位】:首都经济贸易大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP391.1;F832.51
[Abstract]:Social networks have developed rapidly in recent years. Sina Weibo has a wide coverage, its content is easy to produce, it spreads quickly, and provides a large amount of direct or indirect data. By extracting the text data from Weibo of Sina and combining the information of the rise and fall of the Shanghai Composite Index, this paper explores the correlation of the two questions, and tries to establish a forecasting model to provide certain reference information for the stock market investors. Sina Weibo text data capture, mainly through their own web crawler to achieve. Among them, the problems of user login, advanced search, IP access times per unit time, text extraction, text cleaning and index extraction are analyzed and solved. Combining the text information of Sina Weibo and the closing price information of Shanghai Composite Index, and combining the artificial neural network algorithm, the final forecast model of the closing price of Shanghai Composite Index is established by Sina Weibo. The main innovations of this paper are as follows: 1. The research on forecasting the trend of Shanghai Composite Index by Sina Weibo data has not been found in China. This paper takes this as the starting point and forecasts the trend of Shanghai Composite Index by using the data of Sina Weibo. 2. The mechanism of distributed system is introduced in the process of text content capture of Sina Weibo, which solves the anti-network crawler restriction set at user level and IP level. 3. This research belongs to the time series analysis, the article innovatively solves the Sina Weibo's search, under the condition that designates the time interval and designates the Weibo related key words, successfully grabs the Weibo content. 4. The improved artificial neural network algorithm, the variable data set and the automatic correction feature can improve the prediction accuracy of the model.
【学位授予单位】:首都经济贸易大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP391.1;F832.51
【参考文献】
相关期刊论文 前10条
1 张国安;钟绍辉;;基于微博用户评论和用户转发的数据挖掘[J];电脑知识与技术;2012年27期
2 张宗科;;自动下载批量网页的一种模拟人工实现方法[J];电脑编程技巧与维护;2013年12期
3 张晨逸;孙建伶;丁轶群;;基于MB-LDA模型的微博主题挖掘[J];计算机研究与发展;2011年10期
4 刘金红;陆余良;;主题网络爬虫研究综述[J];计算机应用研究;2007年10期
5 庞磊;李寿山;张慧;周国栋;;基于微博的股票投资者未来情感倾向识别研究[J];计算机科学;2012年S1期
6 张e,
本文编号:2380849
本文链接:https://www.wllwen.com/jingjilunwen/touziyanjiulunwen/2380849.html