当前位置:主页 > 科技论文 > 农业技术论文 >

基于随机森林算法的土壤图斑分解

发布时间:2018-07-26 18:26
【摘要】:多边形的制图方式和利用长期的野外调查以及航空图片解译过程决定了传统土壤图不管是在调查方法上还是在制图方式上,其效率都相对低下,费时费力,而且传统土壤图的精度也很难再满足现代科学发展的日新月异。在新时代的发展下,传统土壤图主要面临以下几个问题。首先,制图比例尺的大小往往决定了最小图斑的大小,比例尺越大,土壤图中可表达的最小图斑就越小,因而传统土壤图在成图过程中会因为比例尺的限制而忽略掉一些大图斑中的小图斑,产生了土壤图空间和属性上的简化;其次,手工的多边形的表达方式也忽略掉了土壤的空间渐变特征,多边形边界处的突变导致了原本是连续变化的土壤空间和属性的突变,最后基于专家经验和手工的制图方式非常耗时耗力且容易产生人为误差。然而包含了大量专家知识的传统土壤图是历史留下的宝贵资料,对于当下的相关研究仍具有重要的参考价值。本文将湖北省黄冈市红安县华家河镇滠水河流域作为研究区域,结合全国第二次土壤普查获得的传统土壤图,利用已有的地形数据和多光谱数据,在GIS平台和R语言环境下采用随机森林模型挖掘土壤-环境关系知识,并利用该模型对原有的土壤图斑进行空间分解,得到了空间信息更加详细的土壤分布图。具体的研究步骤分为以下几步:1)提取与研究区域成土环境相关的景观因子数据。在此选择的初始环境变量数据包括土壤母质数据、地形数据和多光谱数据,利用从高程数据中提取的坡度、坡向、地形湿度指数、沿等高线曲率、沿平年曲率和水平曲率,从多光谱数据中提取的归一化植被指数、归一化水体指数、第一主成分、偏斜、信息熵、方差、平均值,加上母质构成研究所用的因变量。2)设计采样点。采取图斑面积加权的采样方式设计采样点,保证每个图斑至少有10个样点,最终确定了6686个样点。利用样点提取研究所用的环境因子数据并将样点数据按母质进行分类。3)环境因子的筛选。为确保制图精度和效率,需要剔除掉一部分对模型贡献率低的因子,本研究利用R语言提供的变量重要性测度importance()函数进行因子筛选。4)模型参数的确定。随机森林模型的两个非常重要的参数mtry和ntree可以分别通过随机森林模型袋外误差和模型稳定性的计算进行判断。5)模型的应用。利用R语言中的Random Forest包对数据进行建模,得到四种母质单元下的四组模型,利用这四组模型对研究区域内每个栅格位置的环境因子信息进行投票判断,通过投票最终得到各个位置的土壤类型,进而可以得到所研究区域的土壤图。研究表明:相比于传统的土壤图,图斑分解后的整个土壤图在图斑的数量上明显增多,空间分布更加详细,展现了更多的细节信。本研究中利用RF模型在分类问题上实现了较好的表达,说明利用RF模型来获取土壤-环境关系的知识是真实可信的,可以为精细数字土壤制图提供一种高效的方法。另外,随机森林算法提供的变量重要性测度函数可以对变量重要性进行排序,删除对模型贡献率小的因子,不仅保证了分类精度,还大大提高了运算效率,为今后大面积进行土壤图斑分解提供了可靠的方法和依据。
[Abstract]:The drawing method of polygon and the process of using long field survey and aerial picture interpretation determine that the efficiency of the traditional soil map is relatively low and time-consuming, and the precision of the traditional soil map is difficult to meet the development of modern science. The traditional soil map is mainly faced with the following problems. First, the size of the mapping scale often determines the size of the smallest plot. The larger the scale, the smaller the smallest map that can be expressed in the soil map, so the traditional soil map will ignore the small spots in the large plot because of the scale limitation during the drawing process. The space and attribute of the soil map are simplified. Secondly, the expression of the hand polygon also neglects the characteristics of the soil spatial gradient. The mutation of the polygon boundary leads to the mutation of the soil space and properties that have been changed continuously. Finally, based on the expert experience and manual drawing, it is very time-consuming and easy to produce people. However, the traditional soil map, which contains a large number of expert knowledge, is the valuable information left by the history, and still has important reference value for the present research. This paper takes the water river basin of huayuhe Town, Hong'an County, Huanggang City, Hubei Province as the research area, and combines the traditional soil map obtained by the National Second Soil Census. Some terrain data and multi spectral data are used in GIS platform and R language environment to excavate soil environmental knowledge, and use this model to decompose the original soil map in space, and get more detailed spatial distribution map of spatial information. The specific research steps are divided into following steps: 1) extraction and research area The initial environmental variables in this selection include soil parent material data, topographic data and multispectral data, using gradient, slope, terrain humidity index, curvature along the contour, horizontal curvature and horizontal curvature to extract normalized vegetation from multi spectral data. Index, normalized water index, the first principal component, deviation, information entropy, variance, mean value, and the dependent variable.2 used in the research of the parent material. The sampling point is designed with the weighted sampling pattern of the patch area to ensure that each spot has at least 10 samples, and the 6686 samples are finally determined. Boundary factor data and classification of sample data according to the parent material.3) environmental factors screening. In order to ensure mapping precision and efficiency, we need to eliminate a part of the factors that have low contribution to the model. This study uses the variable importance measure importance () function provided by the R language to determine the parameters of the.4 model. Two very important parameters, mtry and nTree, can be used to judge the.5) model through the calculation of the external error of the random forest model and the calculation of the model stability respectively. Using the Random Forest packet in the R language, the data are modeled and four groups of models under the four matrix units are obtained, and the four groups of models are used to study each grid position in the area. The environment factor information is voted to determine the soil type in each location by voting, and then the soil map of the area is obtained. The study shows that the whole soil map after the decomposition of the map is significantly increased in the number of spots compared with the traditional soil map, and the spatial distribution is more detailed, showing more details. In this study, we use the RF model to achieve a better expression on the classification problem. It shows that the knowledge of using the RF model to obtain the soil environmental relationship is true and credible. It can provide a efficient method for the fine digital soil mapping. In addition, the variable importance measure function provided by the random forest algorithm can be important to the variables. In order to delete the factor of small contribution to the model, it not only ensures the accuracy of the classification, but also greatly improves the efficiency of the calculation. It provides a reliable method and basis for the soil map decomposition in large area in the future.
【学位授予单位】:华中农业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:S159.9

【参考文献】

相关期刊论文 前10条

1 刘雪琦;朱阿兴;杨琳;缪亚敏;曾灿英;;土壤图更新中基于土壤类型面积分级的训练样点选择方法[J];土壤学报;2017年01期

2 王茵茵;齐雁冰;陈洋;解飞;;基于多分辨率遥感数据与随机森林算法的土壤有机质预测研究[J];土壤学报;2016年02期

3 黄魏;罗云;汪善勤;陈家赢;韩宗伟;祁大成;;基于传统土壤图的土壤—环境关系获取及推理制图研究[J];土壤学报;2016年01期

4 郭澎涛;李茂芬;罗微;林清火;唐群锋;刘志崴;;基于多源环境变量和随机森林的橡胶园土壤全氮含量预测[J];农业工程学报;2015年05期

5 赵北庚;;基于R语言randomForest包的随机森林建模研究[J];计算机光盘软件与应用;2015年02期

6 韩宗伟;黄魏;罗云;张春弟;祁大成;;基于路网的土壤采样布局优化——模拟退火神经网络算法[J];应用生态学报;2015年03期

7 杨琳;朱阿兴;张淑杰;安艺明;;土壤制图中多等级代表性采样与分层随机采样的对比研究[J];土壤学报;2015年01期

8 宁亮亮;张晓丽;;基于纹理信息的Landsat-8影像植被分类初探[J];中南林业科技大学学报;2014年09期

9 韩宗伟;黄魏;张春弟;罗云;;基于土壤养分-景观关系的土壤采样布局合理性研究[J];华中农业大学学报;2014年01期

10 张淑杰;朱阿兴;刘京;杨琳;;基于样点的数字土壤属性制图方法及样点设计综述[J];土壤;2012年06期

相关硕士学位论文 前2条

1 周银;基于决策树方法的县级土壤数字制图研究[D];浙江大学;2011年

2 李杭燕;时间序列NDVI数据集重建方法研究[D];兰州大学;2010年



本文编号:2146909

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/nykj/2146909.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户8f521***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com