当前位置:主页 > 科技论文 > 自动化论文 >

改进的LMS-KNN近邻分类方法研究

发布时间:2018-08-15 19:12
【摘要】:近邻分类算法作为经典的机器学习算法之一,因其无需估计参数、易于实现、适合多分类问题的特点,近年来在广告、聊天机器人、网络安全、医疗保健、营销策划等领域得到了广泛应用。其中,基于局部均值与类均值的近邻分类算法(Nearest neighbor classification based on local mean and class mean,LMS-KNN)是针对K近邻分类(K-nearest neighbor classification)对离群点不敏感,没有利用样本全局信息等问题的一种改进算法,改进后的算法虽然在分类精度和分类效率得到一定的提高,但是该算法还存在一些弊端。数据的不平衡性会影响LMS-KNN的分类精度,同时该算法涉及到很多参数的设置,如近邻值K的选取、权值的确定、距离度量方式的选取等等。因此,为了进一步改进LMS-KNN算法的分类精度,本文进行了以下的研究工作:1)总结分析了几种常用的近邻分类方法和局部均值与类均值的近邻分类算法,对比了它们各自的算法原理和优缺点,并简单介绍了文中所用到的几种优化算法。2)针对不平衡数据对LMS-KNN分类精度的影响,运用迭代近邻过采样的算法对数据进行预处理,并把处理后的近似平衡数据集,用半监督的局部均值与类均值进行分类。3)采用交叉验证与传统迭代算法确定LMS-KNN分类算法参数,本文先将该分类算法交叉验证误差模型化,再把类均值向量的权重基于客观决策信息确定为数学公式,最终运用步长优化的统一迭代法来对加权权重进行选取,在平衡主客观决策规则的情况下改进传统算法的分类精度和分类效率。4)为了优化LMS-KNN分类算法参数的确定,利用遗传算法(Genetic Algorithm)可以在不依赖问题具体领域的情况下求解非线性、多目标等复杂优化问题,提出了一种基于遗传算法的局部均值和类均值最近邻分类算法,该方法选取类均值的权重为初始种群,以分类误差为评价函数,通过遗传迭代选取最佳的类均值特征权重,与传统的KNN、LM-KNN(A local mean based nonparametric classifier)及LMS-KNN算法的实验比较分析表明,该方法在UCI数据集上可有效地搜索出合适的特征权重,获得更好的分类精度。
[Abstract]:As one of the classical machine learning algorithms, the nearest neighbor classification algorithm is suitable for multi-classification problems because it does not need to estimate parameters and is easy to implement. In recent years, it has been widely used in advertising, chat robot, network security, medical care, etc. Marketing planning and other fields have been widely used. Among them, the nearest neighbor classification algorithm based on local mean and class means, (Nearest neighbor classification based on local mean and class mean-LMS-KNN, is an improved algorithm for K-nearest neighbor classification (K-nearest neighbor classification) is insensitive to outliers and does not use global information of samples). Although the improved algorithm improves the classification accuracy and classification efficiency, it still has some drawbacks. The unbalance of data will affect the classification accuracy of LMS-KNN. At the same time, the algorithm involves the setting of many parameters, such as the selection of nearest neighbor value K, the determination of weight value, the selection of distance measure and so on. Therefore, in order to further improve the classification accuracy of the LMS-KNN algorithm, the following research work: 1) summarizes and analyzes several commonly used nearest neighbor classification methods and local mean and class mean nearest neighbor classification algorithms. In this paper, the principles, advantages and disadvantages of their algorithms are compared, and several optimization algorithms used in this paper are briefly introduced. In view of the effect of unbalanced data on LMS-KNN classification accuracy, the iterative nearest neighbor oversampling algorithm is used to preprocess the data. After processing the approximate equilibrium data set, the semi-supervised local mean and class mean are used to classify the parameters of the LMS-KNN classification algorithm. The cross-validation and the traditional iterative algorithm are used to determine the parameters of the LMS-KNN classification algorithm. In this paper, the cross-validation error of the classification algorithm is first modeled. Then the weight of the class mean vector is determined as a mathematical formula based on objective decision information, and the weighted weight is selected by the uniform iterative method of step size optimization. In order to optimize the parameter determination of LMS-KNN classification algorithm, genetic algorithm (Genetic Algorithm) can solve the nonlinearity without depending on the specific domain of the problem in order to optimize the parameter determination of LMS-KNN classification algorithm by improving the classification accuracy and classification efficiency of the traditional algorithm under the condition of balancing the subjective and objective decision rules. In this paper, a local mean and class mean nearest neighbor classification algorithm based on genetic algorithm is proposed. The weight of class mean is selected as initial population, and the classification error is used as evaluation function. The best class mean weight is selected by genetic iteration, and compared with the traditional KNNN LM-KNN (A local mean based nonparametric classifier) and LMS-KNN algorithm, the experimental results show that this method can effectively search the appropriate feature weights on the UCI dataset and obtain better classification accuracy.
【学位授予单位】:电子科技大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP181

【参考文献】

相关期刊论文 前10条

1 古平;杨炀;;面向不均衡数据集中少数类细分的过采样算法[J];计算机工程;2017年02期

2 李彦冬;郝宗波;雷航;;卷积神经网络研究综述[J];计算机应用;2016年09期

3 曾勇;舒欢;胡江平;葛月月;;基于BP神经网络的自适应伪最近邻分类[J];电子与信息学报;2016年11期

4 安波;;人工智能与博弈论——从阿尔法围棋谈起[J];中国发展观察;2016年06期

5 文志诚;陈志刚;;基于隐马尔可夫模型的网络安全态势预测方法[J];中南大学学报(自然科学版);2015年10期

6 崔承刚;杨晓飞;;基于内部罚函数的进化算法求解约束优化问题[J];软件学报;2015年07期

7 蒋卓人;陈燕;高良才;汤帜;刘晓钟;;一种结合有监督学习的动态主题模型[J];北京大学学报(自然科学版);2015年02期

8 孟子健;马江洪;;一种可选初始聚类中心的改进k均值算法[J];统计与决策;2014年12期

9 李知艺;丁剑鹰;吴迪;文福拴;;步长优化技术在交直流系统潮流计算中的应用研究[J];华北电力大学学报(自然科学版);2014年03期

10 王秀岩;;决策树算法及其应用[J];电子技术与软件工程;2014年05期

相关博士学位论文 前2条

1 于文华;数学问题解决中模式识别的影响因素研究[D];南京师范大学;2012年

2 向晓林;非线性代数方程组与几何约束问题求解[D];四川大学;2003年

相关硕士学位论文 前6条

1 樊存佳;基于CHI和KNN的文本特征选择与分类算法的研究[D];北京工业大学;2016年

2 岳永鹏;深度无监督学习算法研究[D];西南石油大学;2015年

3 俞闯;半监督学习中不平衡数据集分类研究[D];大连理工大学;2015年

4 李俊平;人工智能技术的伦理问题及其对策研究[D];武汉理工大学;2013年

5 徐晓艳;基于K近邻算法的中文文本分类研究[D];安徽大学;2012年

6 林丽;基于语义距离的文本聚类算法研究[D];厦门大学;2007年



本文编号:2185147

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2185147.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户5513a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com