基于核密度估计的光谱数据分类与回归方法研究
发布时间:2019-01-01 13:05
【摘要】:本文针对遥感和傅立叶红外透射两种不同形式的光谱进行了基于核密度估计的高光谱遥感数据分类和傅立叶红外透射光谱回归分析研究。高光谱遥感数据分类面临的难点就是如何有效处理谱线波段强相关问题,而傅里叶红外透射光谱回归分析的难点在于如何准确识别谱线中的隐含峰。已有的基于谱线匹配和统计特征(例如支持向量机)的高光谱分类方法往往忽略了谱线波段间的相关性,从而限制了这些分类方法的表现。同时,由于传统的基于最小二乘法的拟合方法存在正规方程组病态化的缺陷,使其往往无法识别谱线中的隐含峰,从而导致不精确的光谱数据分析结果。为了有效地解决上述光谱数据分类和回归分析中面临的难点和问题,本文从分析光谱数据的概率密度函数分布入手,主要开展了以下三部分的研究工作,提出了基于等效概率的灵活朴素贝叶斯分类器FNBEPNSK、基于联合概率密度函数估计的非朴素贝叶斯分类器NNBC以及基于模糊积分的核回归集成算法KREFI。 1)研究了Gaussian,Uniform,Triangular,Epanechnikov,Biweight,Triweight和Cosine七种不同的核函数对基于边缘概率密度函数估计的朴素贝叶斯分类器表现的影响,其中Gaussian核为光滑核函数,后六种为非光滑核函数。通过分析这七种不同核函数在概率密度函数估计中的效能,给出了非光滑核函数的作用条件,并针对非光滑核函数在概率密度函数估计中的缺点,提出了基于等效概率的灵活朴素贝叶斯分类器FNBEPNSK。在标准的UCI数据集和真实高光谱数据集上的验证结果表明,等效概率的应用显著改善了朴素贝叶斯分类器的分类表现。 2)为了有效处理样本条件属性之间的相关性,我们将联合概率密度估计的思想引入到了高光谱遥感数据分类器的设计中,提出了基于联合概率密度估计的非朴素贝叶斯分类器NNBC。为了确定联合概率密度估计中的带宽参数,本文设计了基于积分均方误差最小化的参数选择标准,保证了最佳带宽参数的选取,并通过与基于积分方差最小化的参数选择标准比较证实其有效性。同时,理论证明了当属性之间存在强相关时联合概率密度函数估计的最优性。最后,,在UCI数据集以及真实高光谱数据集上的实验结果表明非朴素贝叶斯分类器在获得较高概率密度函数估计质量的同时,显著地提升了朴素贝叶斯分类器的分类精度。 3)针对硅基薄膜傅里叶红外透射光谱曲线波动频率大、隐含峰难识别的特点设计了一款能够充分考虑光谱数据概率分布信息、具有高稳定性和高准确度的核回归集成方法。通过在6个标准的测试函数上的实验比较,证实了基于交叉验证带宽选择策略Priestley-Chao核回归器PCKE1和PCKE2的高方差的特性。之后,设计了基于模糊积分的核回归集成模型KREFI对四种不同的Priestley-Chao核回归器进行了融合以提高核回归算法的稳定性,其中模糊积分中的模糊测度使用了三种不同的粒子群优化算法进行确定。最后,在标准的测试函数以及28条不同形式的硅基薄膜傅里叶红外透射光谱上对核回归集成算法KREFI的表现进行了验证,结果表明KREFI获得了良好的回归表现,在一定程度上解决了谱线隐含峰无法识别的问题。
[Abstract]:In this paper, a high-spectral remote sensing data classification and a Fourier infrared transmission spectrum regression analysis based on nuclear density estimation are carried out for two different forms of spectrum of remote sensing and Fourier infrared transmission. The difficulty of the high-spectral remote sensing data classification is how to effectively deal with the strong correlation of the spectral line, and the difficulty of the Fourier infrared transmission spectrum regression analysis is how to accurately identify the hidden peaks in the spectral line. the existing high-spectral classification methods based on spectral line matching and statistical features, such as support vector machines, tend to ignore the correlation between the spectral line bands, thereby limiting the performance of these classification methods. At the same time, because the traditional method of fitting least square method has the defect that the normal set of equations is ill, it is often unable to identify the hidden peaks in the spectral line, resulting in inaccurate spectral data analysis results. In order to effectively solve the difficulties and problems of the above-mentioned spectral data classification and regression analysis, this paper starts with the analysis of the probability density function distribution of the spectral data, mainly carries out the research work of the following three parts, and puts forward the flexible and simple Bayesian classifier FNBEPNSK based on the equivalent probability. The non-naive Bayesian classifier NNBC based on the joint probability density function estimation and the kernel regression integration algorithm KREFI based on the fuzzy integral. 1) The effect of seven different kernel functions of Gaussian, Uniform, Triangular, Epantechikov, Biweght, Triweight and Cosine on the performance of the naive Bayesian classifier based on the estimation of the edge probability density function is studied. By analyzing the performance of the seven different kernel functions in the estimation of the probability density function, the working condition of the non-smooth kernel function is given, and the shortcoming of the non-smooth kernel function in the estimation of the probability density function is given, and the flexible and simple Bayesian classifier FNBEPNS based on the equivalent probability is proposed. K. The results of the verification on the standard UCI data set and the real high spectrum data set show that the application of the equivalent probability significantly improves the classification table of the Naive Bayes classifier (2) In order to effectively deal with the correlation between sample condition attributes, we introduce the idea of joint probability density estimation into the design of high-spectral remote sensing data classifier, and put forward the non-naive Bayesian classifier N based on joint probability density estimation. In order to determine the bandwidth parameters in the joint probability density estimation, this paper designs a parameter selection criterion based on the minimization of the integral mean square error, guarantees the selection of the best bandwidth parameter, and compares it with the parameter selection criteria based on the minimization of the integral variance. validity. At the same time, the theory proves that the joint probability density function is estimated when there is a strong correlation between the attributes Finally, on the UCI data set and the real high spectrum data set, the experimental results show that the non-naive Bayesian classifier significantly improves the score of the Naive Bayes classifier while the higher probability density function estimation quality is obtained. Based on the characteristics of the large fluctuation frequency of the Fourier infrared transmission spectrum of the silicon-based thin-film and the difficult recognition of the implicit peak, a kind of kernel back-back with high stability and high accuracy can be considered in full consideration of the characteristic of the spectral data probability distribution information. Based on the experimental comparison of the six standard test functions, the high performance of the Priestley-Chao kernel regression PCKE1 and PCKE2 based on the cross-verification bandwidth selection strategy is proved. Then, a kernel regression integration model KREFI based on fuzzy integral is designed to fuse four different Priestley-Chao nucleators to improve the stability of the kernel regression algorithm, in which the fuzzy measure in the fuzzy integral uses three different particle swarm optimization. Finally, the performance of the kernel regression integration algorithm KREFI is verified by the standard test function and 28 different forms of the silicon-based thin film Fourier infrared transmission spectrum. The result shows that the KREFI has a good regression performance, and the hidden peak of the spectral line cannot be solved to a certain extent.
【学位授予单位】:河北大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:O438.2;TP79
本文编号:2397607
[Abstract]:In this paper, a high-spectral remote sensing data classification and a Fourier infrared transmission spectrum regression analysis based on nuclear density estimation are carried out for two different forms of spectrum of remote sensing and Fourier infrared transmission. The difficulty of the high-spectral remote sensing data classification is how to effectively deal with the strong correlation of the spectral line, and the difficulty of the Fourier infrared transmission spectrum regression analysis is how to accurately identify the hidden peaks in the spectral line. the existing high-spectral classification methods based on spectral line matching and statistical features, such as support vector machines, tend to ignore the correlation between the spectral line bands, thereby limiting the performance of these classification methods. At the same time, because the traditional method of fitting least square method has the defect that the normal set of equations is ill, it is often unable to identify the hidden peaks in the spectral line, resulting in inaccurate spectral data analysis results. In order to effectively solve the difficulties and problems of the above-mentioned spectral data classification and regression analysis, this paper starts with the analysis of the probability density function distribution of the spectral data, mainly carries out the research work of the following three parts, and puts forward the flexible and simple Bayesian classifier FNBEPNSK based on the equivalent probability. The non-naive Bayesian classifier NNBC based on the joint probability density function estimation and the kernel regression integration algorithm KREFI based on the fuzzy integral. 1) The effect of seven different kernel functions of Gaussian, Uniform, Triangular, Epantechikov, Biweght, Triweight and Cosine on the performance of the naive Bayesian classifier based on the estimation of the edge probability density function is studied. By analyzing the performance of the seven different kernel functions in the estimation of the probability density function, the working condition of the non-smooth kernel function is given, and the shortcoming of the non-smooth kernel function in the estimation of the probability density function is given, and the flexible and simple Bayesian classifier FNBEPNS based on the equivalent probability is proposed. K. The results of the verification on the standard UCI data set and the real high spectrum data set show that the application of the equivalent probability significantly improves the classification table of the Naive Bayes classifier (2) In order to effectively deal with the correlation between sample condition attributes, we introduce the idea of joint probability density estimation into the design of high-spectral remote sensing data classifier, and put forward the non-naive Bayesian classifier N based on joint probability density estimation. In order to determine the bandwidth parameters in the joint probability density estimation, this paper designs a parameter selection criterion based on the minimization of the integral mean square error, guarantees the selection of the best bandwidth parameter, and compares it with the parameter selection criteria based on the minimization of the integral variance. validity. At the same time, the theory proves that the joint probability density function is estimated when there is a strong correlation between the attributes Finally, on the UCI data set and the real high spectrum data set, the experimental results show that the non-naive Bayesian classifier significantly improves the score of the Naive Bayes classifier while the higher probability density function estimation quality is obtained. Based on the characteristics of the large fluctuation frequency of the Fourier infrared transmission spectrum of the silicon-based thin-film and the difficult recognition of the implicit peak, a kind of kernel back-back with high stability and high accuracy can be considered in full consideration of the characteristic of the spectral data probability distribution information. Based on the experimental comparison of the six standard test functions, the high performance of the Priestley-Chao kernel regression PCKE1 and PCKE2 based on the cross-verification bandwidth selection strategy is proved. Then, a kernel regression integration model KREFI based on fuzzy integral is designed to fuse four different Priestley-Chao nucleators to improve the stability of the kernel regression algorithm, in which the fuzzy measure in the fuzzy integral uses three different particle swarm optimization. Finally, the performance of the kernel regression integration algorithm KREFI is verified by the standard test function and 28 different forms of the silicon-based thin film Fourier infrared transmission spectrum. The result shows that the KREFI has a good regression performance, and the hidden peak of the spectral line cannot be solved to a certain extent.
【学位授予单位】:河北大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:O438.2;TP79
【参考文献】
相关期刊论文 前10条
1 冯长根,赵军,刘霞;多金属氧酸盐的红外光谱[J];光谱学与光谱分析;2005年01期
2 琚宜文,姜波,侯泉林,王桂梁;构造煤结构成分应力效应的傅里叶变换红外光谱研究[J];光谱学与光谱分析;2005年08期
3 张琳,张黎明,李燕,刘丙萍,王晓斐,王俊德;偏最小二乘法在傅里叶变换红外光谱中的应用及进展[J];光谱学与光谱分析;2005年10期
4 吴景贵,席时权,姜岩;红外光谱在土壤有机质研究中的应用[J];光谱学与光谱分析;1998年01期
5 胡全一;祝民强;;火星高光谱遥感大气校正方法[J];东华理工大学学报(自然科学版);2008年03期
6 娄全胜;陈蕾;王平;张晨;谢健;;高光谱遥感技术在海洋研究的应用及展望[J];海洋湖沼通报;2008年03期
7 梁尧钦;曾辉;;高光谱遥感在植被特征识别研究中的应用[J];世界林业研究;2009年01期
8 王桂清,刘敏娜;红外光谱技术的近代进展及其应用[J];现代仪器;2002年02期
9 唐延林,黄敬峰;农业高光谱遥感研究的现状与发展趋势[J];遥感技术与应用;2001年04期
10 李志忠;杨日红;党福星;张显峰;谭炳香;赵慧洁;;高光谱遥感卫星技术及其地质应用[J];地质通报;2009年Z1期
相关博士学位论文 前2条
1 陈进;高光谱图像分类方法研究[D];国防科学技术大学;2010年
2 周阳;透明ZnO薄膜与铁电薄膜的集成研究[D];河北大学;2011年
本文编号:2397607
本文链接:https://www.wllwen.com/guanlilunwen/gongchengguanli/2397607.html