当前位置:主页 > 经济论文 > 经济发展论文 >

基于机器学习和GAM模型方法对北京二手房的交互研究

发布时间:2019-03-25 13:30
【摘要】:近年来我国经济迅猛发展,人民生活水平质量不断提高,也同时激发了人民的投资需求,房产成了重要的投资目标,进而推动了房产价格的上升。尤其是2008年经济危机以后,北京的房价一路飙升,高到天价,有的房子能高到每平米十几万,北京住房压力巨大。截止2016年5月,北京二手房占市场成交比例已高达80%,同时北京二手房价也在短短几年之内翻了几倍。为了寻找到适合研究北京二手房房价差异较好的模型以及观察影响房价的因素是如何造成二手房房价差异的,本文利用2016年5月北京六个城区16210套二手房数据,首先利用K-均值(K-means)聚类对房屋类型进行了分析,然后构建普通最小二乘线性回归模型(Ordinary Least Squares,OLS),对数OLS模型,K近邻(K-nearest neighbor,KNN)回归方法,对数KNN回归,非线性广义相加模型(Generalized Additive Models,GAM),对数GAM模型这六种方法对采集到的预测变量之间有无交互项两种情况进行了研究,进而用稳定性方法寻找最优模型,最后又用OLS模型、对数OLS模型、GAM模型以及对数GAM模型这四个方法建模进行分析。结果发现,所采集的房屋有四种类型,分别为地段型,郊区型,大众型和大户型。在模型的泛化能力方面,对数KNN回归在无交互研究下是最优的,对数GAM模型在有交互研究下是最优的,且对数GAM模型是十二种模型中最优的;在模型解释方面,GAM模型无论是有无交互项还是是否对房价做了对数变换,都揭示连续型预测变量和房价之间的复杂非线性关系;在模型拟合优度方面,有交互对数GAM模型的拟合优度最高,效果最好;交互模型预测效果优于非交互模型,多个预测变量之间存在交互效应,研究预测变量交互效应可以提供很多有用信息,比如:利用有交互的线性模型可以得到在海淀区地铁对房价的影响比在西城区地铁对房价的影响大,说明海淀区地铁房提升二手房价格的速度比西城区地铁房房价更快。得出的结论是,非参交互模型更加适合对二手房的研究,连续型变量对房价的影响是非线性变化的,并且多个变量之间存在交互效应。本文研究的是来自横截面上,同一时间的房价差异,建立更好研究模型的目的让购房者在做决策的时候,拥有一个客观的参照。因为从大量北京二手房样本中得到的房屋价格比简单比较三两家房价得到的房价参照会更加客观可靠,从而做出的决断也会更理性。
[Abstract]:In recent years, China's economy has developed rapidly and the quality of people's living standards has been continuously improved. At the same time, it has also stimulated the people's investment demand, and real estate has become an important investment target, thus promoting the rise of property prices. Especially after the 2008 economic crisis, Beijing's housing prices have skyrocketed, some houses can be as high as more than 100,000 square meters per square meter, Beijing housing pressure is huge. As of May 2016, second-hand housing in Beijing accounted for as much as 80 percent of market transactions, while second-hand housing prices in Beijing had more than doubled in just a few years. In order to find a suitable model to study the difference of second-hand housing price in Beijing and to observe how the factors that influence the price of second-hand house cause the difference of second-hand house price, this paper uses the data of 16210 second-hand houses in six urban areas of Beijing in May 2016. In this paper, we first use K-means (K-means) clustering to analyze the types of houses, then construct the ordinary least squares linear regression model (Ordinary Least Squares,OLS, logarithmic OLS model, K nearest neighbor (K-nearest neighbor,KNN) regression method, and then construct the general least square linear regression model (Ordinary Least Squares,OLS, logarithmic OLS model and K nearest neighbor regression method. Logarithmic KNN regression, nonlinear generalized additive model (Generalized Additive Models,GAM), logarithmic GAM model are used to study the interaction between the predicted variables, and then the stability method is used to find the optimal model. Finally, the OLS model, the logarithmic OLS model, the GAM model and the logarithmic GAM model are used to model the model. The results show that there are four types of houses collected, namely, ground type, subdistrict type, populace type and large household type. In terms of generalization ability of the model, logarithmic KNN regression is optimal under no interactive study, logarithmic GAM model is optimal under interactive research, and logarithmic GAM model is optimal among the twelve models. In the interpretation of the model, the GAM model reveals the complex nonlinear relationship between the continuous forecasting variables and the house price, whether there is an interaction term or whether the logarithmic transformation of the house price has been carried out. In the aspect of model goodness-of-fit, the cross-logarithmic GAM model has the highest goodness-of-fit and the best effect. The prediction effect of interactive model is better than that of non-interactive model, and there are interaction effects among several prediction variables. It can provide a lot of useful information to study the interaction effect of prediction variables. For example, by using the interactive linear model, we can get that the influence of subway on house price in Haidian district is greater than that in Xicheng district, which shows that the price of second-hand house in Haidian district is higher than that in Xicheng district, which indicates that the price of second-hand house in Haidian district is higher than that in Xicheng district. The conclusion is that the non-parametric interaction model is more suitable for the study of second-hand housing, the influence of continuous variables on house prices is nonlinear, and there are interaction effects among several variables. This paper studies the difference of house prices at the same time from the cross-section. The purpose of establishing a better research model is to give buyers an objective reference in making decisions. Because the prices obtained from a large sample of second-hand houses in Beijing are more objective and reliable than those obtained from a simple comparison of the prices of three or two houses, the decisions made will also be more rational.
【学位授予单位】:太原理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:F299.23

【参考文献】

相关期刊论文 前8条

1 蔡正U,

本文编号:2447018


资料下载
论文发表

本文链接:https://www.wllwen.com/jingjifazhanlunwen/2447018.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户1127f***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com