基于异常值处理的随机森林和kNN模型在EEG数据中的应用

发布时间：2018-07-05 00:51

本文选题：脑电图 + 分类　；参考：《兰州大学》2017年硕士论文

【摘要】：最近以来,随着深度学习和人工智能技术的快速进步,研究人员开始借助于这些新技术来研究关于脑电图的问题。使用脑电图,医生能够更好地诊断脑部疾病;研究人员也能够更好地了解脑电波与行为活动之间的关系,从而研发更加智能的设备。本文通过将脑电图测量仪器采集的脑电图数据作为输入,将对应的人的眼睛的状态作为输出来进行脑电图的研究。为有效提高脑电图数据眼睛状态分类的可靠性及精确度,本文根据脑电图数据的规律及脑电图在人睁眼闭眼时数据变化的特征,提出了基于数据异常值处理的随机森林和kNN模型。本文首先对原始数据进行数据预处理:这部分主要包含对数据的缺失值处理、异常值处理和一致性分析;对于本文使用的数据,我们使用统计量分析和分维可视化图来处理数据集中的异常值。然后,在数据进行异常值处理后,使用随机森林和kNN建立具体的模型。对于随机森林,主要对模型的OOB误差率和变量的重要性进行讨论;对于kNN模型,由于k值对模型比较关键,本文通过在训练集上采用交叉验证的方法来确定k值,进而使用确定后的k值来进行测试集的评价。最后为了显示随机森林和kNN算法这两个模型在该数据集上的有效性,本文使用决策树、Bagging和SVM模型作为对比方法,进行模型的比较,同时也讨论了数据集中样本的不均衡性对模型的影响。结果表明:本文提出的基于异常值处理的随机森林和kNN模型具有更好的预测准确度,随机森林的预测精确度达到92.9392%,kNN算法的预测精确度达到97.0946%。由此,随机森林和kNN算法都是有效的该脑电图数据的预测模型,尤其是kNN算法,相比本文中的其他方法,其具有最好的预测效果。
[Abstract]:Recently, with the rapid progress of deep learning and artificial intelligence, researchers have begun to use these new technologies to study the problems of electroencephalograph. Using electroencephalograph, doctors can better diagnose brain diseases; researchers can also better understand the relationship between brain waves and behavioral activities, so as to develop more intelligence. In order to effectively improve the reliability and accuracy of the classification of the eye state of the electroencephalogram data, this article is based on the law of the EEG data and the eyes closed to the eyes. In this paper, a random forest and kNN model based on data abnormity processing is proposed. Firstly, the original data is preprocessed: this part mainly includes missing value processing, outlier processing and consistency analysis. For the data used in this paper, we use statistics and fractal visualization To deal with the abnormal values of the data set. Then, after the data is processed, a specific model is established using the random forest and kNN. For the random forest, the importance of the OOB error rate and the importance of the variable is discussed. For the kNN model, because the K value is more critical to the model, this paper uses cross validation on the training set. In order to show the validity of the two models of the random forest and the kNN algorithm, the decision tree, the Bagging and the SVM model are used as comparison methods to make the comparison between the two models of the random forest and the kNN algorithm, and the disequilibrium of the data concentration samples is also discussed. The results show that the stochastic forest and kNN model proposed in this paper have better prediction accuracy, the prediction accuracy of the random forest is 92.9392%, the prediction accuracy of the kNN algorithm is 97.0946%., and the random forest and the kNN algorithm are all the effective prediction models of the EEG data. It is the kNN algorithm. Compared with other methods in this paper, it has the best prediction effect.
【学位授予单位】：兰州大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：R318;TP18

【参考文献】