基于聚类和加权K近邻的烟叶分级研究
本文选题:光谱分析技术 切入点:烟叶分级 出处:《郑州大学》2017年硕士论文 论文类型:学位论文
【摘要】:烟叶收购阶段,正确客观的划分烟叶等级既可提高烟农的种植积极性,又可保障卷烟企业的经济利益。现阶段的人工分级存在主观性强、人力和物力耗费大等缺点,针对同一片烟叶,不同的专家有可能将它划分到不同的等级。因此,客观、快速、高准确率的智能分级是迫切需要的。目前,烟叶智能分级的研究集中在基于烟叶图像和红外光谱进行分级两个方面。由于烟叶的光谱特征可以更好地反映含油量、色度、身份及成熟度等与烟叶等级密切相关的因素,所以本文基于光谱对烟叶分级进行了研究。烟叶智能分级系统的识别率和整体速度与所选择的分级模型和样本特征光谱的采集量存在很大的关系,为实现一个具有高识别率的实时烟叶智能分级系统本文主要进行了以下工作:1.烟叶光谱的采集、预处理和孤立样本的检测。利用型号为UV-3600的光谱仪采集642(13个等级)片烟叶的反射光谱;为降低基线漂移所带来的噪声和特征值间的差异对分级的影响,对光谱进行了归一化处理;由于可能存在错分类别的样本(孤立样本),所以需要对构建分级模型的样本训练集进行选择。本文分别利用夹角余弦距离、欧氏距离和相关系数法并通过统计分析选择合适的阈值检测各个等级中的孤立样本和确定用于建立分级模型的样本训练集。2.分级模型的构建以及K近邻算法的改进。分别构建支持向量机(SVM)、极限学习机(ELM)、K近邻(KNN)和加权K近邻等烟叶分级模型,将分级模型的识别率作为适应度函数,全光谱下ELM和SVM的测试集最优正确率分别为85.75%和91.02%。加权K近邻方法为:一种方法是每个等级中训练集的权重相同,为该等级样本个数的倒数。另一种方法是先找出K个近邻,加上与距离呈负相关的权重,通过计算每个等级的权重之和为烟叶进行定级,两种方法相结合的识别率可达90.77%。加权K近邻的分类效果优于传统K近邻,计算复杂度低于SVM和ELM,本文选用加权K近邻作为烟叶等级判断的分类器。3.基于聚类思想的特征初步筛选。同时考虑相同特征的类内离散度和类间离散度,构造判别特征好与坏的鉴别函数D,依据D值删除拐点右侧特征,在第6个拐点下取得最优分级效果,余下326个特征,测试集正确率由90.77%增加至94.59%,既提高了识别率又降低了特征的个数。4.深层特征的筛选。采用粒子群(BPSO)、遗传算法(GA)、相关系数分析(CC)进一步进行特征的筛选。BPSO取得较好的效果,特征数目由原来的451个减少到143个,这样采集光谱所耗费的时间可节省68.3%;识别率由原来的90.77%提高到93.69%,提高了2.92个百分点。
[Abstract]:In the stage of tobacco leaf purchase, the correct and objective classification of tobacco leaf grade can not only improve the planting enthusiasm of tobacco farmers, but also protect the economic interests of cigarette enterprises. For the same leaf, it is possible for different experts to divide it into different grades. Therefore, an objective, fast and accurate intelligent grading is urgently needed. The research on intelligent classification of tobacco leaves is focused on two aspects: tobacco image and infrared spectrum. Because the spectral characteristics of tobacco leaf can better reflect the oil content, chroma, identity and maturity and other factors closely related to tobacco grade. The recognition rate and the overall speed of the intelligent tobacco classification system have a great relationship with the selected classification model and the collection amount of the sample characteristic spectrum. In order to realize a real-time intelligent classification system of tobacco leaves with high recognition rate, the following work was carried out in this paper: 1. The collection of tobacco leaf spectrum, Pretreatment and detection of isolated samples. The reflectance spectra of 642 (13 grades) tobacco leaves were collected by using a UV-3600 spectrometer; the effects of noise and differences in eigenvalues to reduce the baseline drift on the classification, In this paper, the spectrum is normalized, and the sample training set for constructing the hierarchical model needs to be selected because of the possible existence of the wrong subclass samples (isolated samples). In this paper, the angle cosine distance is used respectively. Euclidean distance and correlation coefficient method and statistical analysis to select appropriate threshold to detect isolated samples in each level and to determine the training set of samples used to establish hierarchical model .2. the construction of hierarchical model and the improvement of K-nearest neighbor algorithm. The tobacco leaf classification models, such as support vector machine (SVM), extreme learning machine (LLM) and KNN (weighted K nearest neighbor), were constructed, respectively. Using the recognition rate of hierarchical model as fitness function, the optimal accuracy of test set for ELM and SVM is 85.75% and 91.02 respectively in full spectrum. The weighted K-nearest neighbor method is that the weight of training set in each level is the same. The other method is to find out K nearest neighbors, plus the weight negatively related to distance, by calculating the sum of the weights of each grade to grade tobacco leaves. The recognition rate of the two methods combined can reach 90.77. The classification effect of weighted K-nearest neighbor is better than that of traditional K-nearest neighbor. The computational complexity is lower than that of SVM and Elm. In this paper, weighted K nearest neighbor is chosen as classifier of tobacco leaf grade judgment. The feature is preliminarily screened based on clustering idea, and the intra-class dispersion and inter-class dispersion of the same feature are considered at the same time. A discriminant function D of good and bad features is constructed. According to D value, the right feature of inflection point is deleted, and the optimal classification effect is obtained at the sixth inflection point. The remaining 326 features are obtained. The correct rate of test set is increased from 90.77% to 94.59, which not only improves the recognition rate but also reduces the number of features .4.The selection of deep features is carried out by using particle swarm optimization (BPSO), genetic algorithm (GA) and correlation coefficient analysis (#en0#), and the better results are obtained in the further screening of features by means of particle swarm optimization (PSO), genetic algorithm (GA) and correlation coefficient analysis (#en0#). The number of features is reduced from 451 to 143, which saves 68.3 percent of the time spent in collecting the spectrum, and increases the recognition rate from 90.77% to 93.69 percent, an increase of 2.92 percentage points.
【学位授予单位】:郑州大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TS442
【相似文献】
相关期刊论文 前6条
1 晋子铭;;怎样做好烟叶分级检验工作[J];烟草科技通讯;1980年04期
2 金钊;金剑;;烟叶分级投料管理系统开发[J];中国烟草学报;2010年05期
3 朱尊权;;烟叶分级和烟草生产技术的改革(一)——在1990年2月13日中国烟草总公司于广州召开的烟叶分级研讨会上的讲话(根据录音整理)[J];烟草科技;1990年03期
4 朱尊权;;烟叶分级和烟草生产技术的改革(二)——在1990年2月13日中国烟草总公司于广州召开的烟叶分级研讨会上的讲话[J];烟草科技;1990年04期
5 于华堂;;烟叶分级基本知识[J];烟草科技;1987年04期
6 陈风雷;孙红权;郑少清;郭亮;穆东升;;初烤烟叶专业化分级效率研究[J];山地农业生物学报;2012年03期
相关会议论文 前1条
1 闫新甫;罗安娜;;美国烟叶分级标准体系中类、型和组的划分[A];中国烟草学会2009年年会论文集[C];2009年
相关重要报纸文章 前10条
1 特约记者 赵家荣;云南烟叶分级人才辈出[N];中华合作时报;2006年
2 陈晓波;提高烟叶分级综合能力[N];经理日报;2010年
3 雷朴昭;红塔进行烟叶分级中高级工职业技能鉴定[N];经理日报;2007年
4 本报记者 乔夫;烟草业如何突破培养“蓝领”的瓶颈[N];中华合作时报;2005年
5 陈登科邋张,
本文编号:1599902
本文链接:https://www.wllwen.com/shoufeilunwen/boshibiyelunwen/1599902.html