Cox模型中的变量选择方法及股票市场实证研究
发布时间:2018-04-13 13:25
本文选题:Cox比例风险回归模型 + 变量选择 ; 参考:《中南财经政法大学》2017年硕士论文
【摘要】:近年来,生存分析方法与技术广泛应用于流行病学和临床医学,研究者们逐渐将其引入到人口统计学、保险精算学、经济学等领域,但这些方法在金融领域的应用还不算多,本文运用Cox比例风险回归模型,来研究股票交易数据,以沪深300指数的基本成分股为样本,意图找出影响股票生存期的重要因素,并比较Cox模型中的变量选择方法的优劣,以期找到更合适的方法来研究股票市场。首先,分协变量之间相互独立和协变量之间存在相关关系两种情形,进行数值模拟实验,探究在Cox比例风险回归模型基础上,Lasso方法和Elastic Net方法的变量选择效果,并验证Elastic Net方法的组效应性质,为针对沪深300指基本成分股股票数据的实证分析做准备。然后,运用国泰君安数据库收集每支股的30个财务指标,以2016年第一季度作为观测时间,并定义沪深300指数的股票生存期,得到每支股票在该季度的生存期和生存状态,整理出所需要的基本股票数据。通过分析2016年第一季度的股票研究数据,得出30个财务指标的相关系数,并进行协变量的描述性统计分析,了解协变量的基本数据特征。随后分别利用Cox逐步回归方法、Lasso方法和Elastic Net方法这三种方法进行实证分析,求解算法运用了坐标下降算法,并运用10折交叉验证方法寻找合适的参数值,从而得到影响股票生存期的重要协变量,并分析其影响作用的程度与方向。最后,比较这三种实证方法的优劣,总结三种方法选择出来的共同的重要协变量,发现Lasso变量选择方法和Elastic Net方法的变量选择效果比Cox逐步回归方法好,Lasso方法和Elastic Net方法选择的协变量比Cox逐步回归方法要精简,没有多余的变量。通过Cox逐步回归方法选择出的变量存在多重共线性,说明此方法不太适用于自变量之间存在相关关系的情况,而Lasso方法选择出来的变量没有相关关系,说明当自变量之间存在共线性时,该方法能较好地处理这种情况。Elastic Net方法具有一个显著的特征,即组效应性质,即能将具有相关关系甚至是强相关的协变量共同选入模型,而Lasso方法没有这种性质,它只能在具有相关关系的变量之间选出一个进入模型,不能同时将协变量选入。特别是当数据呈现高维度、小样本、强相关的特征时,Elastic Net方法更加优于Lasso方法。在拟合效果方面,Lasso方法和Elastic Net方法优于Cox逐步回归法,而Lasso方法的模型拟合效果最好。
[Abstract]:In recent years, survival analysis methods and techniques have been widely used in epidemiology and clinical medicine. Researchers have gradually introduced them into the fields of demography, insurance actuarial science, economics and so on, but these methods have not been widely used in the field of finance.To find a more appropriate way to study the stock market.First of all, in the case of independent covariables and correlation between covariables, numerical simulation experiments are carried out to explore the effect of variable selection based on Cox proportional risk regression model and Elastic Net method.The group effect property of Elastic Net method is verified to prepare for the empirical analysis of Shanghai and Shenzhen 300 index basic component stock data.Then, using the Guotai Junan database to collect 30 financial indicators of each stock, taking the first quarter of 2016 as the observation time, and defining the stock survival period of the CSI 300 index, we can get the survival period and survival status of each stock in that quarter.Sort out the basic stock data you need.By analyzing the stock research data in the first quarter of 2016, the correlation coefficients of 30 financial indexes are obtained, and the descriptive statistical analysis of the covariables is carried out to understand the basic data characteristics of the covariables.Then the Cox stepwise regression method and Elastic Net method are used for empirical analysis. The coordinate descent algorithm is used to solve the problem, and the 10 fold cross-validation method is used to find the appropriate parameter value.An important covariable influencing stock life is obtained, and the degree and direction of its influence are analyzed.Finally, the advantages and disadvantages of the three empirical methods are compared, and the common important covariables selected by the three methods are summarized.It is found that the selection effect of Lasso variable selection method and Elastic Net method is better than that of Cox stepwise regression method and Elastic Net method is simpler than Cox stepwise regression method.The variables selected by Cox stepwise regression method have multiple collinearity, which shows that this method is not suitable for the case where independent variables have correlation relations, but the variables selected by Lasso method have no correlation relationship.It is shown that when there is collinearity between independent variables, the method can well deal with this case. Elastic Net method has a remarkable characteristic, that is, the group effect property, that is, the covariables with correlation and even strong correlation can be selected into the model together.The Lasso method does not have this property. It can only select one entry model between the variables with correlation, and can not select the covariable at the same time.Especially when the data show high dimension, small sample and strong correlation, the Elastic Net method is better than the Lasso method.In terms of fitting effect, the Lasso method and Elastic Net method are better than Cox stepwise regression method, while Lasso method has the best model fitting effect.
【学位授予单位】:中南财经政法大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F832.51
【参考文献】
相关期刊论文 前10条
1 李春红;韦新星;;Elastic Net方法在Cox模型变量选择中的研究[J];西南大学学报(自然科学版);2015年07期
2 贺筱君;陈俊男;吴佳懋;;生存分析在股市期市涨跌预测中的应用[J];数量经济技术经济研究;2014年12期
3 王娉;郭鹏江;夏志明;;Logistic模型中参数的自适应Lasso估计[J];西北大学学报(自然科学版);2012年05期
4 刘睿智;杜n,
本文编号:1744706
本文链接:https://www.wllwen.com/jingjilunwen/huobiyinxinglunwen/1744706.html