基于预期偏差的突发金融文本分类方法研究
发布时间:2018-07-21 10:21
【摘要】:伴随着中国经济的发展,金融市场与人们的生活越来越息息相关。研究表明突发金融信息会迅速给金融市场带来强烈扰动影响,而随着互联网技术和社交网络的快速发展,这种影响会被大幅放大。通常,对于利好信息,股票价格呈快速上扬的趋势,对于利空信息,股票价格往往呈现下挫趋势。然而近来,证券市场在面对重要利好信息时,价格反而呈现出总体下挫的趋势,这对传统基于金融信息挖掘的方法带来较大冲击。传统文本分类方法在这种情况下并不能对金融新闻作出准确的分类。原因在于,传统的分类方法通常将研究重点放在分类模型本身上,把文本特征作为模型输入预测文本类标。针对这个问题,本文提出了基于预期偏差的金融文本分类方法。在提出预期偏差概念的基础上,通过主题模型对文本做主题匹配,然后通过描述性词典对新闻做预期偏差计算,最后得到基于预期偏差的分类模型对文本进行分类。本文主要研究工作及成果概况如下。首先,本文采用强扰动共振过滤及K-means文本聚类过滤的方法从大量新闻文本中抽取有效用的突发新闻,实现了新闻文本初筛过程。其次,针对常用文本分类方法分类效果较差的问题,本文提出了基于预期偏差的分类方法。通过分析LDA主题模型,提出了新闻文本主题之间匹配的概念。利用新闻文本主题聚类结果作为先验分布,预测新闻文本的主题并计算新闻文本主题之间的相似度。在主题相似的基础上,继而提出基于词典的新闻文本之间偏差程度的度量方法,度量新闻文本之间的偏差。最后,本文结合LDA新闻主题匹配以及新闻之间的偏差程度的度量两方面内容,构造分类模型,用于对新闻文本的分类。实验结果表明,在金融市场异常的情况下,通过本文提出的文本分类方法对新闻进行分类时,能够获得更准确的分类效果。
[Abstract]:With the development of Chinese economy, financial market is more and more closely related to people's life. Research shows that sudden financial information can bring a strong disturbance to the financial market quickly, but with the rapid development of Internet technology and social networks, this impact will be greatly amplified. Usually, for the positive information, stock prices tend to rise rapidly, and for bearish information, stock prices tend to decline. However, in the face of the important good information, the price of the securities market has shown an overall downward trend, which has a great impact on the traditional methods based on financial information mining. In this case, the traditional text classification method can not make an accurate classification of financial news. The reason is that the traditional classification methods usually focus on the classification model itself and use the text feature as the input of the model to predict the text class. To solve this problem, this paper proposes a financial text classification method based on expected deviation. On the basis of putting forward the concept of expected deviation, the text is matched by topic model, and then the expected deviation of news is calculated by descriptive dictionary. Finally, a classification model based on expected deviation is obtained to classify text. The main research work and results of this paper are as follows. Firstly, this paper uses strong perturbed resonance filtering and K-means text clustering filtering to extract useful burst news from a large number of news texts, and realizes the process of initial screening of news texts. Secondly, aiming at the poor classification effect of common text classification methods, this paper proposes a classification method based on expected deviation. Through the analysis of LDA topic model, the concept of topic matching between news texts is put forward. The topic clustering result of news text is used as a priori distribution to predict the topic of news text and calculate the similarity between the topics of news text. On the basis of the similarity of topics, a method of measuring the degree of deviation between news texts based on dictionaries is proposed to measure the deviation between news texts. Finally, combining LDA news topic matching and the measurement of news deviation degree, this paper constructs a classification model to classify news texts. The experimental results show that, in the case of financial market anomalies, a more accurate classification effect can be obtained when the text classification method proposed in this paper is used to classify news.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
本文编号:2135209
[Abstract]:With the development of Chinese economy, financial market is more and more closely related to people's life. Research shows that sudden financial information can bring a strong disturbance to the financial market quickly, but with the rapid development of Internet technology and social networks, this impact will be greatly amplified. Usually, for the positive information, stock prices tend to rise rapidly, and for bearish information, stock prices tend to decline. However, in the face of the important good information, the price of the securities market has shown an overall downward trend, which has a great impact on the traditional methods based on financial information mining. In this case, the traditional text classification method can not make an accurate classification of financial news. The reason is that the traditional classification methods usually focus on the classification model itself and use the text feature as the input of the model to predict the text class. To solve this problem, this paper proposes a financial text classification method based on expected deviation. On the basis of putting forward the concept of expected deviation, the text is matched by topic model, and then the expected deviation of news is calculated by descriptive dictionary. Finally, a classification model based on expected deviation is obtained to classify text. The main research work and results of this paper are as follows. Firstly, this paper uses strong perturbed resonance filtering and K-means text clustering filtering to extract useful burst news from a large number of news texts, and realizes the process of initial screening of news texts. Secondly, aiming at the poor classification effect of common text classification methods, this paper proposes a classification method based on expected deviation. Through the analysis of LDA topic model, the concept of topic matching between news texts is put forward. The topic clustering result of news text is used as a priori distribution to predict the topic of news text and calculate the similarity between the topics of news text. On the basis of the similarity of topics, a method of measuring the degree of deviation between news texts based on dictionaries is proposed to measure the deviation between news texts. Finally, combining LDA news topic matching and the measurement of news deviation degree, this paper constructs a classification model to classify news texts. The experimental results show that, in the case of financial market anomalies, a more accurate classification effect can be obtained when the text classification method proposed in this paper is used to classify news.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.1
【参考文献】
相关期刊论文 前2条
1 陈立中;赵萌;;证券投资基金的反馈交易行为:存在性检验及对股价波动的影响[J];金融经济学研究;2013年01期
2 张秋丽;;浅议证券投资基金对证券市场的实际影响[J];经济论坛;2011年07期
,本文编号:2135209
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2135209.html