偏最小二乘及稀疏偏最小二乘回归的应用研究
					发布时间:2018-01-29 21:29
				
				
				
				
				本文关键词: 偏最小二乘回归模型 稀疏偏最小二乘回归模型 云南省电力需求 交叉验证 出处:《昆明理工大学》2015年硕士论文 论文类型:学位论文
【摘要】:当今,高维复杂数据在各个科学领域广泛出现,这就要求统计学家寻求新的统计建模方法.处理高维数据的一个潜在难点是如何解决预测变量之间的多维共线性.偏最小二乘(PLS)回归是传统多元线性回归的推广,非常适用于具有强相关性数据的统计分析处理.偏最小二乘在建模过程中采用信息综合和筛选技术,从原有变量中提取若干对系统最具解释能力的新成分,然后再利用这些新的综合变量进行建模,可以说偏最小二乘是多元线性回归,主成分分析和典型相关分析这三者的综合.本文利用随机模拟的数据及云南省电力数据,从偏最小二乘的建模原理、模型求解、模型算法、算法模拟、参数调节、数据分析等方面对偏最小二乘模型展开了详细的研究和探讨,并利用交叉验证、均方差等准则对多元线性回归和偏最小二乘模型进行了综合比较,数据分析结果表明当预测变量之间存在较强的共线性时,偏最小二乘具有较高的优越性.本文的另一个研究重点是稀疏偏最小二乘(SPLS)回归.由于偏最小二乘的每个新成分都是原来所有预测变量的线性组合,当预测变量数较大时,这会给模型解释带来负面影响,也不利于最重要预测变量的寻找.稀疏偏最小二乘是偏最小二乘的改进,它能在偏最小二乘的基础上对估计系数进行收缩,并使那些较小的系数(绝对值意义下)恰好收缩到零,从而使与之对应的变量能够从模型中剔除.本文研究了稀疏偏最小二乘算法和实现,并采用类似于研究偏最小二乘的思路,对多元回归、偏最小二乘和稀疏偏最小二乘模型进行了全方面的比较,并就云南省电力数据,找出了影响电力消费的最重要因素.模拟数据回归结果表明:偏最小二乘回归及稀疏偏最小二乘回归模型可以有效解决变量之间存在共线性的问题.相比之下,稀疏偏最小二乘回归模型的拟合效果更好,模型预测精度更高.对云南省电力消费影响因素进行的研究表明:云南省的电力需求随着云南省经济的发展,社会消费品零售总额的增长以及固定资产投资的增加在不断增长.云南省的城镇化进程同样也拉动了全社会对电力的需求,居民消费价格指数的升高也对电力需求有正向的拉动作用,但作用不大可忽略.
[Abstract]:Nowadays, high-dimensional and complex data are widely used in various fields of science. This requires statisticians to seek new statistical modeling methods. A potential difficulty in dealing with high-dimensional data is how to solve the multi-dimensional collinearity between predictive variables. Partial least Squares (PLS). Regression is a generalization of traditional multivariate linear regression. It is very suitable for statistical analysis and processing with strong correlation data. In the modeling process, partial least squares uses information synthesis and screening techniques to extract some new components that have the most ability to explain the system from the original variables. Then using these new comprehensive variables to model, it can be said that partial least squares is multivariate linear regression. This paper uses random simulation data and Yunnan electric power data, from the partial least squares modeling principle, model solution, model algorithm, algorithm simulation. Parameter adjustment, data analysis and other aspects of the partial least squares model were studied and discussed in detail, and the use of cross-validation, mean square error and other criteria for multiple linear regression and partial least squares model comprehensive comparison. The data analysis results show that there is a strong collinearity between the predicted variables. Partial least squares has higher superiority. Another research focus of this paper is sparse partial least squares regression. Because each new component of partial least squares is a linear combination of all the original prediction variables. When the number of prediction variables is large, this will bring negative effects to the interpretation of the model, and is also not conducive to the search of the most important prediction variables. Sparse partial least squares is an improvement of partial least squares. It can shrink the estimated coefficients on the basis of partial least squares and make the smaller coefficients (in the absolute sense) just shrink to zero. So that the corresponding variables can be removed from the model. This paper studies the sparse partial least squares algorithm and its implementation. The partial least squares model and sparse partial least squares model are compared in all aspects, and the electric power data of Yunnan Province are compared. The simulation results show that the partial least square regression and sparse partial least squares regression model can effectively solve the problem of collinearity between variables. The sparse partial least square regression model has better fitting effect and higher prediction precision. The research on the influencing factors of Yunnan power consumption shows that the power demand of Yunnan Province is developing with the development of Yunnan economy. The growth of total retail sales of consumer goods and the increase of investment in fixed assets in Yunnan Province has also driven the demand for electricity from the whole society. The increase of consumer price index also has a positive effect on electricity demand, but the effect is not negligible.
【学位授予单位】:昆明理工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:O212.1
【参考文献】
相关期刊论文 前4条
1 于松青;林盛;;基于偏最小二乘回归的山东省电力需求预测分析[J];干旱区资源与环境;2015年02期
2 陈月东;;稀疏偏最小二乘方法用于光谱波长选择及定量分析[J];计算机与应用化学;2014年02期
3 潘东东;童艳彩;陈兴;唐年胜;;基于R的运筹学实验教学实践与探讨[J];统计与管理;2014年01期
4 李科;;基于阈值回归模型的中国电力消费与经济增长的关系[J];系统工程理论与实践;2012年08期
,本文编号:1474400
本文链接:https://www.wllwen.com/kejilunwen/yysx/1474400.html

