随机森林及数据可视化在棉蚜等级预测中的应用研究
本文选题:数据分析 切入点:随机森林 出处:《山东农业大学》2017年硕士论文
【摘要】:棉蚜的监测预警是对棉蚜提前防治的研究重点,采集棉蚜发生相关的数据进行分析预测,提前对棉蚜进行防治,减少棉蚜给棉花带来的危害,实现棉区的高产和优产。进行数据分析的研究过程从两方面展开:一是利用高性能的机器算法;二是从数据可视化的角度对数据进行展示分析。本文首先利用随机森林算法进行了棉蚜的数据分析。随机森林是由多棵决策树构成的集成分类机器学习算法,多用来进行数据的分类预测。决策树和多元线性回归算法也同随机森林一样常用来做数据的预测。但是算法的不同,可能导致在同一数据集上的预测率不一致,所以本文对三种算法在UCI数据集和粘虫数据集上进行了准确率对比的实验。目前进行棉蚜虫害等级预测多用的线性回归模型,线性回归模型的缺点是采用何种因子进行表达只是一种猜测,以至于影响了因子的多样性和不可测性。随机森林模型的构建不会因为影响因子的表达有所影响,况且随机森林算法不会产生过拟合,处理大样本集时速度快,对于多元共线性不敏感,分类预测的准确率较高。本文的对比实验中表明了随机森林在数据预测中准确率高,后期的实验采用随机森算法在棉蚜等级预测中进行应用。棉花是我国重要的经济作物,在农业经济格局中作用巨大。而棉蚜是造成棉花减产和影响优产的主要因素,因此棉蚜的提前防治非常重要。本文在对采集到的数据进行数据的不平衡性处理和影响因子的筛选之后,构建基于气象因子数据和棉蚜天敌数据的随机森林模型,并利用构建好的模型对棉蚜虫害发生的等级进行预测。本实验表明随机森林模型的泛化误差较小,在棉蚜虫害等级预测中的准确率比较高。其次利用数据可视化技术进行数据分析。数据可视化技术作为数据分析的重要手段,用于棉蚜数据、气象数据的分析中为棉蚜的防治提供参考。多维数据可视化作为数据可视化的研究重点之一,通过对多维数据进行展示,发现属性之间联系。目前我们采集的数据为多维数据,将采集到的气象数据和棉蚜数据进行可视化展示,发现数据隐藏的规律信息,有助于更好的进行数据分析与决策。本论文中数据的展示与分析使得对棉蚜的大发生时间有了了解,为我们在合适的时间进行防治提供参考,实验中数据的可视化为模型的构建和实验结果的展示与分析起到了重要作用。
[Abstract]:Monitoring and early warning of cotton aphids is the focus of the study on the early control of cotton aphids. The data related to the occurrence of cotton aphids are collected to analyze and predict the occurrence of cotton aphids, to control the cotton aphids in advance, to reduce the harm of cotton aphid to cotton, and to realize the high yield and high yield of cotton aphids.The research process of data analysis is carried out from two aspects: one is to use high performance machine algorithm, the other is to display and analyze the data from the point of view of data visualization.In this paper, the random forest algorithm was used to analyze the data of cotton aphid.Stochastic forest is an integrated classification machine learning algorithm composed of multiple decision trees, which is often used for data classification and prediction.Decision trees and multivariate linear regression algorithms are also used to predict data as well as random forests.However, different algorithms may lead to inconsistent prediction rates on the same dataset. Therefore, the accuracy of the three algorithms on the UCI data set and the armyworm dataset is compared.At present, the linear regression model is used to predict the pest grade of cotton aphid. The disadvantage of the linear regression model is that the expression of the factors is only a guess, so that the diversity and unpredictability of the factors are affected.The construction of the stochastic forest model will not be affected by the expression of the influencing factors. Moreover, the stochastic forest algorithm will not produce over-fitting, and it can deal with large sample sets quickly, and it is insensitive to multivariate collinearity, and the accuracy of classification and prediction is high.The comparative experiment in this paper shows that the accuracy of random forest in data prediction is high. In the later experiment, the random forest algorithm is applied to the prediction of cotton aphid grade.Cotton is an important cash crop in China, which plays an important role in agricultural economic pattern.The cotton aphid is the main factor to reduce the yield of cotton and affect the yield of cotton, so it is very important to control the aphid in advance.In this paper, a random forest model based on meteorological factor data and natural enemy data of cotton aphid was constructed after the data imbalance processing and the screening of influence factors were carried out on the collected data.The class of cotton aphid pests was predicted by using the established model.The results showed that the generalization error of stochastic forest model was small, and the accuracy of prediction of cotton aphid pest grade was higher than that of random forest model.Secondly, data visualization technology is used for data analysis.As an important means of data analysis, data visualization technology is used in the data of cotton aphids. The analysis of meteorological data provides a reference for the control of cotton aphids.As one of the key points of data visualization, multidimensional data visualization can discover the relationship between attributes by displaying multidimensional data.At present, the data we collect are multidimensional data. The meteorological data and the data of cotton aphid are displayed visually, and the regular information of data hiding is found, which is helpful for better data analysis and decision making.The display and analysis of the data in this paper make us understand the occurrence time of cotton aphid, and provide a reference for us to control the aphid at the right time.Visualization of experimental data plays an important role in modeling and demonstration and analysis of experimental results.
【学位授予单位】:山东农业大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18;S435.622.1
【参考文献】
相关期刊论文 前10条
1 许世卫;王东杰;李灯华;高利伟;;我国“互联网+”现代农业进展与展望[J];农业网络信息;2017年01期
2 霍宏;;计算机技术在现代农业中的应用[J];电子技术与软件工程;2016年02期
3 李诒靖;郭海湘;李亚楠;刘晓;;一种基于Boosting的集成学习算法在不均衡数据中的分类[J];系统工程理论与实践;2016年01期
4 戚森昱;杜京霖;钱沈申;殷复莲;;多维数据可视化技术研究综述[J];软件导刊;2015年07期
5 苗煜飞;张霄宏;;决策树C4.5算法的优化与应用[J];计算机工程与应用;2015年13期
6 靳然;李生才;;基于小波神经网络的麦蚜发生量预测研究[J];天津农业科学;2015年04期
7 任磊;杜一;马帅;张小龙;戴国忠;;大数据可视分析综述[J];软件学报;2014年09期
8 刘敏;郎荣玲;曹永斌;;随机森林中树的数量[J];计算机工程与应用;2015年05期
9 温廷新;张波;邵良杉;;煤与瓦斯突出预测的随机森林模型[J];计算机工程与应用;2014年10期
10 杨彦波;刘滨;祁明月;;信息可视化研究综述[J];河北科技大学学报;2014年01期
相关会议论文 前1条
1 姚丽花;;气象要素与棉蚜种群变化的成因分析[A];中国气象学会2007年年会生态气象业务建设与农业气象灾害预警分会场论文集[C];2007年
相关硕士学位论文 前2条
1 王瑞松;大数据环境下时空多维数据可视化研究[D];浙江大学;2016年
2 隆轲;BP神经网络在虫害预测上的应用研究[D];湖南农业大学;2014年
,本文编号:1693875
本文链接:https://www.wllwen.com/shoufeilunwen/xixikjs/1693875.html