基于时间序列特征的聚类分析在融资融券与A股交易中的研究

发布时间：2018-03-08 16:16

本文选题：融资融券　切入点：时间序列特征提取　出处：《山东大学》2017年硕士论文　论文类型：学位论文

【摘要】：时间序列的聚类分析的研究对象常见于面板数据,对时间序列进行聚类分析则不同,是基于无监督性学习的分类算法机制,在研究不同序列之间的相关性问题上能够挖掘更深层次和更高维度的信息,这种思路在各类基于分形学与模式识别的量化交易中均有所体现。时间序列相似性的传统判定方法多通过计算测试序列与目标序列相对应时刻的点与点之间的欧氏距离来进行判定,进而完成聚类的目的。但是传统方法对于不同采样频率带来的缺失点以及异常的离群点的处理上,有较大的缺陷,算法执行起来会造成较大的偏差,聚类效果难以让人满意。本文给出了一种新型的判断时间序列相似性的方法——特征因子判别法,通过对时间序列的统计特征、线性及非线性特征多个方面进行特征提取,将一维的与时刻一一对应的时间序列数据映射到高维空间,并据此构建与之相对应的特征向量,再通过主成分分析的方法,借以消除特征因子之间可能存在的多重共线性问题,最后使用K-means无监督性学习算法对数据样本进行分类,达到数据层面的类别划分的效果,在此基础上,移动时间窗口重复以上操作,得到新的簇,与上次聚类得到的簇之间两两求交,交集占比最大的即我们所希望找到的相似性证券。本文以融资融券推出以来的交易数据与融资融券标的股票股价走势关系为例,通过提取其特征因子组成特征向量,划分样本内外集合,将样本内的特征矩阵作为K-means算法的输入样本,进行无监督性学习分类,达到聚类分析的效果,从理论上看,聚类分析得到的相似性组合应当,与市场指数走势保持着较强正相关性的同时,还具备较强的趋势性走势,进而可以尝试通过量化择时来对趋势性对象进行研究。与传统方法相比,减小了不同长度时间序列数据缺失点与异常离群点的影响,提高了聚类分析的准确性和时间序列识别的相似度问题。在此基础上,探究其在实际市场中的应用,将聚类分析得到的相似性证券构建投资组合,探究其在样本外与市场指数走势之间的关系并探究两者的价差,从频域角度对传统的EMA指标进行解读,借助信号处理的观点,对其进行修正,构建二阶低通滤波器尽可能地过滤高频噪音信号保留低频信号作为低延迟均线,并以此对两者的价差进行交易性择时,根据其斜率的正负情况给出看多或者看空价差组合的观点。观察价差组合在样本外的表现,证实我们通过低延迟均线系统对聚类分析得到投资组合与市场指数沪深300构造的价差进行量化择时的可行性和合理性。
[Abstract]:The research object of clustering analysis of time series is usually panel data, but the clustering analysis of time series is different, which is based on the unsupervised learning mechanism of classification algorithm. In studying the correlation between different sequences, we can mine deeper and higher-dimensional information. This idea is embodied in all kinds of quantitative transactions based on fractal and pattern recognition. The traditional method of judging the similarity of time series by calculating the points and points between the points corresponding to the test sequence and the target sequence. To determine the distance, But the traditional methods have great defects in the processing of missing points and abnormal outliers brought about by different sampling frequencies. The clustering effect is not satisfactory. This paper presents a new method to judge the similarity of time series, which is the feature factor discrimination method. The feature extraction is carried out from the statistical features, linear and nonlinear features of the time series. The one-to-one time series data of one dimension are mapped to the high-dimensional space, and the corresponding eigenvector is constructed accordingly, and then by principal component analysis (PCA), the problem of multiple collinearity between the feature factors can be eliminated. Finally, K-means unsupervised learning algorithm is used to classify the data samples to achieve the effect of data level classification. On this basis, the moving time window repeats the above operations to get a new cluster. In this paper, the relationship between the trading data and the stock price trend of the underlying stock is taken as an example. By extracting the feature factors to form the feature vector, dividing the set inside and outside the sample, taking the feature matrix in the sample as the input sample of K-means algorithm, the unsupervised learning classification is carried out to achieve the effect of clustering analysis. The similarity combination obtained by cluster analysis should keep a strong positive correlation with the market index trend, but also have a strong trend trend. Furthermore, we can try to study the trend objects by quantitative timing. Compared with the traditional method, the influence of missing points and abnormal outliers in time series of different lengths is reduced. The accuracy of clustering analysis and the similarity of time series recognition are improved. On this basis, the application of clustering analysis in real market is explored, and the portfolio of similar securities obtained by clustering analysis is constructed. This paper probes into the relationship between the outside sample and the trend of the market index, and probes into the price difference between the two, and interprets the traditional EMA index from the angle of frequency domain, and modifies it with the help of the viewpoint of signal processing. The second order low-pass filter is constructed to filter the high-frequency noise signal as much as possible to retain the low-frequency signal as the low-delay mean line, and to make a transactional timing for the price difference between the two. According to the positive and negative slope, the view of the combination of long or short spread is given. It is proved that the feasibility and rationality of the quantitative timing of the spread between the investment portfolio and the Hu-Shen 300 structure of the market index is obtained by using the low delay mean line system.
【学位授予单位】：山东大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：F832.51;F224

【参考文献】