基于机器学习算法在数据分类中的应用研究

发布时间：2018-02-15 07:12

本文关键词： 树叶分类支持向量机粒子群算法主成分分析法癌症分类卷积神经网络　出处：《中北大学》2017年硕士论文　论文类型：学位论文

【摘要】：现实中的很多实际问题都可以转化为数据信息处理中的数据分类问题,例如气象预报、商品推荐、生物信息、网络检测等,而数据信息处理都是以机器学习为基础进行研究的。随着科学技术的发展,机器学习算法的应用领域也变得十分广泛。本文主要介绍了两种机器学习算法:粒子群算法优化支持向量机和卷积神经网络。其中研究了粒子群算法优化支持向量机在树叶分类和癌症基因分类中的预测,卷积神经网络在图像分类中的应用。(1)基于各种树叶的特征构建一个数据预处理模型:先对各种数据进行归一化处理,采用主成分分析方法从16个特征中提取出3个主成分,再建立粒子群算法优化后的支持向量机,用支持向量机对树叶数据进行分类预测。实验结果表明,相对于遗传算法和网格搜索法寻到的最优参数相比,粒子群算法优化支持向量机具有最高的准确率,高达94.1%,高于其他两种分类方法。(2)将粒子群优化的支持向量机模型应用到癌症基因分类中,通过选取多组不同的实验数据对癌症手术后病人的复发和不复发的基因样本进行预测分类。对于三种不同分类方法对于癌症基因分类的不同分类效果,综合实验结果,粒子群优化支持向量机在三种分类方法中达到最好的分类效果。(3)将卷积神经网络应用到图像处理上,通过优化卷积神经网络卷积层和池化层中的滤波器函数,达到了优化性能的作用,再构造一定结构的卷积神经网络,然后将该模型对图像数据集进行分类处理,在对图像进行最后达到预期的分类结果。
[Abstract]:Many practical problems in reality can be transformed into data classification problems in data information processing, such as weather forecast, commodity recommendation, biological information, network detection, etc. And data processing is based on machine learning. With the development of science and technology, In this paper, we mainly introduce two kinds of machine learning algorithms: particle swarm optimization support vector machine and convolution neural network. The prediction of the measuring machine in leaf classification and cancer gene classification, The application of convolution neural network in image classification. (1) A data preprocessing model is constructed based on the characteristics of various leaves. Firstly, the data are normalized, and three principal components are extracted from 16 features by principal component analysis (PCA). Finally, the support vector machine (SVM) is established, which can be used to classify and predict the leaf data. The experimental results show that compared with the optimal parameters obtained by genetic algorithm and grid search, Particle swarm optimization support vector machine (SVM) has the highest accuracy, up to 94. 1%, which is higher than the other two classification methods. (2) the particle swarm optimization support vector machine model is applied to cancer gene classification. By selecting different groups of experimental data to predict and classify the recurrence and non-recurrence gene samples of patients with cancer after operation, three different classification methods for different classification effects of cancer gene classification were synthesized. Particle swarm optimization support vector machine achieves the best classification effect in three classification methods. The convolution neural network is applied to image processing. The filter functions in convolution layer and pool layer are optimized. The function of optimizing performance is achieved, and a convolution neural network with certain structure is constructed, then the image data set is classified by the model, and the expected classification result is achieved at the end of the image classification.
【学位授予单位】：中北大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP18;TP391.41

【参考文献】