基于惩罚似然的变量选择方法及其在高维数据中的应用
[Abstract]:With the rapid development of information technology, we can obtain more and more data information and variable dimension. How to choose the best one from many candidate models has become an important research content in econometrics. A good variable selection method can change the problems existing in the traditional methods, such as large computation and over-fitting. The selected model has good prediction accuracy and prediction ability, effectively eliminates the interference variables, and obtains the most concise model. As a continuous optimization process, the penalty likelihood function method is more stable than the traditional discrete method, even when the number of variables is large, it can be executed effectively by using reasonable algorithm. Therefore, for high dimensional data model, it is more effective, accurate and stable to select the model by using the penalty likelihood function method. In this paper, based on the penalty likelihood function method, the variable selection methods for several kinds of high-dimensional data models are studied. The obtained methods can be used for model selection and variable estimation at the same time. By using probability theory and mathematical statistics, it is proved that the estimator has Oracle property, including the possibility of selecting the model correctly with probability approaching 1, and the asymptotic acceptance of the estimator from the normal distribution. The main conclusions are as follows: firstly, an adaptive bridge estimation method for high dimensional data model is proposed. Inspired by the bridge estimation method, this paper applies different weights to the penalty term according to the importance of the variable, and studies whether the adaptive bridge estimator meets the criteria of good estimator, that is, whether the adaptive bridge estimator has Oracle property. It includes whether the model can be selected correctly with probability approaching 1 and whether the estimator is asymptotically obedient to the normal distribution. In this paper, we prove that the adaptive bridge estimation method has Oracle property under proper conditions. The good numerical and empirical performance of the adaptive bridge estimation method is evaluated by random simulation and actual data. Secondly, in this paper, we study the M- estimation method of the linear regression model of high dimensional data, and discuss the properties of the estimator under the condition that the penalty term is local linear approximation. The frame method of least square estimation and Huber regression. When the outliers or error terms are distributed from the thick tail, the special case of M- estimation is more robust than the least square estimation. In this paper, it is theoretically proved that the estimator obtained by applying certain conditions and combining local linear approximation with M- estimator as objective function has a good large sample property. Choosing the appropriate algorithm to show that the method has better robustness; for ultra-high dimensional data model, we also show that backward regression and our proposed method is better; in the empirical part, The actual data show that the proposed method can select variables and estimate parameters well. Finally, this paper studies the identification method of credit default customers based on Logistic model. The Logistic model which is commonly used in the credit scoring model is selected to identify the influencing factors of the credit default and the Logistic model is used to measure and predict the default risk of the credit customers. The numerical simulation results show that the proposed variable selection method is effective. The empirical results also show that using the variable selection method of the high-dimensional data model proposed in this paper, we can select the model with higher interpretation and prediction ability.
【学位授予单位】:对外经济贸易大学
【学位级别】:博士
【学位授予年份】:2017
【分类号】:F224
【相似文献】
相关期刊论文 前5条
1 吴翌琳;林寅;陈昊;;基于色差法的高维数据展示方法初探[J];统计与决策;2011年07期
2 吴武清;汪成杰;蒋勇;陈敏;;高维数据选元:方法比较及其在纳税评估中的应用[J];管理评论;2013年08期
3 郝媛;高学东;孟海东;;高维数据对象聚类算法效果分析[J];中国管理信息化;2012年08期
4 郭茜;朱杰;;高维数据挖掘技术在教学质量监控与评价的应用研究[J];全国商情(理论研究);2010年11期
5 顾冬娟;戴浩;;改进的基于密度和网格的高维聚类算法[J];科技创新导报;2008年22期
相关会议论文 前6条
1 周煜人;彭辉;桂卫华;;基于映射的高维数据聚类方法[A];04'中国企业自动化和信息化建设论坛暨中南六省区自动化学会学术年会专辑[C];2004年
2 梁俊杰;杨泽新;冯玉才;;大规模高维数据库索引结构[A];第二十三届中国数据库学术会议论文集(研究报告篇)[C];2006年
3 陈冠华;马秀莉;杨冬青;唐世渭;帅猛;;面向高维数据的低冗余Top-k异常点发现方法[A];第26届中国数据库学术会议论文集(A辑)[C];2009年
4 刘运涛;鲍玉斌;吴丹;冷芳玲;孙焕良;于戈;;CBFrag-Cubing:一种基于压缩位图的高维数据立方创建算法(英文)[A];第二十二届中国数据库学术会议论文集(研究报告篇)[C];2005年
5 刘文慧;;PCA与PLS用于高维数据分类的比较性研究[A];2011年中国卫生统计学年会会议论文集[C];2011年
6 刘喜兰;冯德益;王公恕;朱成喜;冯雯;;脸谱分析在中进期地震跟踪预报中的应用[A];中国地震学会第四次学术大会论文摘要集[C];1992年
相关重要报纸文章 前1条
1 本报记者 李双艺;引领高维数据分析先河[N];吉林日报;2013年
相关博士学位论文 前10条
1 刘胜蓝;余弦度量下的高维数据降维及分类方法研究[D];大连理工大学;2015年
2 黄晓辉;高维数据的若干聚类问题及算法研究[D];哈尔滨工业大学;2015年
3 杨崇;高维数据流上的K近邻问题研究[D];山东大学;2016年
4 路梅;面向高维数据的特征学习理论与应用研究[D];苏州大学;2016年
5 徐微微;高维数据降维可视化研究及其在生物医学中的应用[D];武汉大学;2016年
6 连亦e,
本文编号:2220773
本文链接:https://www.wllwen.com/shoufeilunwen/jjglss/2220773.html