基于机器学习的股票排名方法

发布时间：2018-07-29 18:37

【摘要】：多年来,在金融市场的股票投资方面,人们一直都希望能掌握其背后运行的规律,并进行分析与预测。各国投资专家也通过使用不同的投资分析方法,利用海量的股票数据进行数据挖掘,以求寻找出股市背后潜在的运行规则和股票交易的规律,实现对未来的股市发展变化进行预测,以达到收益最大化的目的。本文的主要研究内容是上市公司的股票数据以及公司股价的变化情况。根据公司在研究周期内股价变化水平,计算得到特征值,设计了一个优化版本的k-最近邻算法,然后建立上升趋势体系模型,预测上市公司的股价走势类型,选择适合自己风险类型的上市公司进行投资。随着大数据相关技术的不断成熟,在处理大规模股票数据集的平台选择上,本文主要考虑的是内存消耗和数据运算效率方面,采用了大数据Hadoop平台的HDFS分布式文件系统以及更加高效的MapReduce分布式计算框架,使得整个数据集的ETL流程能高效率便捷的运行。机器学习也是本文的一个核心问题,在对KNN算法进行深入研究后,在此基础上提出了面向大数据模式识别算法。此外,本文提出了三类不同特征集,即分钟价格特征、K线特征、股本特征。通过在大量的真实股票数据上的实验表明,各类特征集对预测股票价格走势均是有效的,在同一类特征集上面向大数据模式识别算法得到的预测结果优于k-最近邻算法,而在不同类的特征集上,股本特征集预测结果的准确率比分钟价格特征集和K线特征集有了较大幅度的提高。本文的研究为在大量的股票标的中选择合适的交易对象提供了一种有效的方法。
[Abstract]:Over the years, in the stock investment of the financial market, people have always wanted to be able to grasp the rules behind the stock market and carry out analysis and prediction. By using different investment analysis methods, various investment experts use a large amount of stock data to excavate the data in order to find out the underlying operating rules and stock trading behind the stock market. The main research content of this paper is the stock data of the listed company and the change of the stock price of the company. According to the company's stock price change level in the study cycle, we have calculated the eigenvalue of the stock price, and designed an optimized version of the k- nearest neighbor. And then we establish an uptrend system model, predict the type of stock price trend of listed companies, select the listed companies suitable for their own risk types, and invest in the listed companies which are suitable for their own risk types. Using the HDFS distributed file system of large data Hadoop platform and the more efficient MapReduce distributed computing framework, the ETL process of the whole data set can run efficiently and conveniently. Machine learning is also a core issue of this paper. After the in-depth study of the KNN algorithm, a large data model is proposed on this basis. In addition, three kinds of different feature sets, namely, minute price features, K-line features and equity characteristics, are proposed in this paper. Through experiments on a large number of real stock data, it is shown that all types of feature sets are effective in predicting stock price trends, and the prediction results obtained from the same type of feature set are superior to the large data pattern recognition algorithm. On the k- nearest neighbor algorithm, on the feature set of different classes, the accuracy of the equity feature set prediction result is greatly improved than the minute price feature set and the K-line feature set. This study provides an effective method for selecting appropriate trading objects in a large number of stock markers.
【学位授予单位】：天津工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP181

【参考文献】