当前位置:主页 > 科技论文 > 软件论文 >

基于社交网络文本分析的短期股市行情预测

发布时间:2018-02-23 01:11

  本文关键词: 股票市场 股票评论 情感分析 股市预测 出处:《华中师范大学》2016年硕士论文 论文类型:学位论文


【摘要】:互联网时代的到来,标志着我们生活方式的巨大改变。人们通过网络可以获取各种想要的信息。特别是伴随着Web技术由Web1.0向Web2.0逐渐过渡,金融领域信息开始在网络上进行集散,论坛、博客等等提供互动的领域不断地涌现。论坛作为众多互动平台之一,越来越多的股民在股票论坛中发表个人对当前股市的看法,产生了大量的具有极大研究价值的网络文本,这些信息中往往包含投资者对股市的相关评论以及今后可能的投资计划信息,由此通过这一类型的股票评论来了解投资者的未来的行为是一条行之有效的路径。目前,国内外已有部分学者尝试通过对社会网络的分析来预测短期股市行情。国外的工作主要关注的是较为成熟的欧美股市,其方法对不太成熟中国股市的描述能力尚待考证;国内已有的工作则主要是探索性工作,缺乏系统性和可量化预测工作。鉴于此,本文通过对国内股市相关的文本资源的抽取和建模并结合情感分析方法,构建了股市涨跌预测模型对短期股市行情进行预测。本文的主要研究工作和贡献如下:第一,互联网上大量存在的关于股市的文字评论有可能反映当前股市的行情,利用这些股票评论,对股市行情能做出一定的预测。本文提出了基于向量空间模型和词向量模型对股票评论文本建模的方法。在学习得到词向量之后,本文采用k-means聚类方法将文本聚类为k个类别。随后,本文提出从文本到词集的映射规则,通过文本和词集的映射规则将短文本映射到一个k维的向量空间中,最后完成对文本的建模。实验结果表明,在词向量建模方式下的最优准确率68%要显著高于在向量空间模型下的最优准确率63.8%,并且这两个准确度都要高于相关文献中给出的预测结果。第二,上述基于简单文本特征的预测方法只考虑了表层特征,对文本中蕴含的深层次信息描述能力有限。因此本文提出一种融合情感分析的股票预测方法。通过预先选取少量已标注情感极性的词汇作为种子词,计算未知情感极性词语与种子词汇的相关性,最终自动生成股票情感词典,并以此词典为基础来对文本进行深层次建模。实验结果表明,融合情感特征的方法比单独基于简单文本特征所得到的预测准确率明显要高。
[Abstract]:The advent of the Internet era marks a great change in our way of life. People can obtain all kinds of information they want through the Internet. Especially, with the gradual transition of Web technology from Web1.0 to Web2.0, financial information begins to be distributed on the network. Forums, blogs, and other areas of interaction continue to emerge. As one of the many interactive platforms, more and more investors express their personal views on the current stock market in the Stock Forum. Has produced a large number of Internet texts of great research value, which often contain investors' comments on the stock market and possible future investment plans. It is an effective way to understand the future behavior of investors through this type of stock review. Some scholars at home and abroad have tried to predict the short-term stock market through the analysis of social network. The existing work in China is mainly exploratory work, lack of systematic and quantifiable prediction. In view of this, this paper combines the emotional analysis method with the extraction and modeling of text resources related to domestic stock market. The main research work and contributions of this paper are as follows: first, there are a lot of comments on the stock market on the Internet that may reflect the current stock market. Using these stock reviews, we can predict the stock market price. In this paper, we propose a method to model stock comment text based on vector space model and word vector model. In this paper, k-means clustering method is used to cluster the text into k categories. Then, a mapping rule from text to word set is proposed, and the short text is mapped to a k-dimensional vector space by the mapping rules of text and word set. Finally, the text modeling is completed. The experimental results show that, The optimal accuracy rate 68% in word vector modeling mode is significantly higher than that in vector space model 63.8%, and both accuracy are higher than the prediction results given in related literature. The above prediction methods based on simple text features only consider surface features. This paper presents a stock prediction method combining affective analysis. A few words with marked affective polarity are selected as seed words in advance. The correlation between unknown affective polarity words and seed words is calculated, and the stock emotion dictionary is generated automatically, based on which the text is modeled at a deep level. The experimental results show that, The prediction accuracy of affective feature fusion is higher than that of simple text feature alone.
【学位授予单位】:华中师范大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1

【参考文献】

相关期刊论文 前10条

1 肖生苓;牟娌娜;王维;高晓红;;基于数据挖掘技术的超市顾客群研究[J];资源开发与市场;2011年08期

2 潘宇曦;叶宇航;贺仁龙;;基于数据挖掘的电信行业精确化套餐设计方法研究[J];情报杂志;2011年S1期

3 钱萍;吴蒙;;同态加密隐私保护数据挖掘方法综述[J];计算机应用研究;2011年05期

4 张靖;金浩;;汉语词语情感倾向自动判断研究[J];计算机工程;2010年23期

5 龚著琳;陈瑛;苏懿;刘雅琴;徐立钧;;数据挖掘在生物医学数据分析中的应用[J];上海交通大学学报(医学版);2010年11期

6 李寿山;黄居仁;;基于Stacking组合分类方法的中文情感分类研究[J];中文信息学报;2010年05期

7 周杰;林琛;李弼程;;基于机器学习的网络新闻评论情感分类研究[J];计算机应用;2010年04期

8 那日萨;刘影;李媛;;消费者网络评论的情感模糊计算与产品推荐研究[J];广西师范大学学报(自然科学版);2010年01期

9 宋晓雷;王素格;李红霞;;面向特定领域的产品评价对象自动识别研究[J];中文信息学报;2010年01期

10 黄永文;何中市;伍星;;产品特征的层次关系获取[J];计算机工程与应用;2009年22期



本文编号:1525768

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1525768.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户7ebad***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com