基于改进随机森林的Android恶意软件检测方法研究

发布时间：2019-01-25 21:37

【摘要】：近年来,随着移动互联网的发展与壮大,智能手机也得到迅速的发展。目前Android系统占据了全球手机操作系统市场份额的一大部分且仍有不断上升的趋势,与此同时,Android也成为了恶意软件泛滥的主要平台。Android恶意软件的恶意行为多种多样,给用户甚至整个社会都带来了巨大的危害和经济损失。因此,如何将Android恶意软件快速高效的分析并检测出来已经成为目前的研究热点。首先对Android平台进行归纳总结,分析了Android的系统架构和应用程序组件,然后对使用到的机器学习算法以及Spark并行环境框架进行分析,为后续研究打下基础。然后,针对随机森林算法的投票原则无法区分强分类器与弱分类器差异的缺陷进行改进,提出一种加权投票改进方法,并在此基础上提出了一种用于检测Android恶意软件的改进随机森林分类模型(Improved Random Forest Classification Model,IRFCM)。IRFCM选取AndroidManifest.xml文件中的Permission信息和Intent信息作为特征属性,并通过特征选择算法进行优化生成特征向量集合,最后应用该模型对最终生成的特征向量集合进行分类检测,实验结果表明IRFCM具有较好的分类精度和分类效率。最后,针对大数据环境下应用程序安装包反编译过程耗时长和特征提取慢的问题,将IRFCM与Spark框架相结合,设计实现并行环境下的Android恶意软件检测。将样本数据转换为Spark框架下的弹性分布式数据集(Resilient Distributed Dataset,RDD),并在虚拟机集群环境中并行地对RDD进行特征提取和分类检测,并行环境下的实验结果与单机环境相比,有效提高了Android恶意软件的检测效率。
[Abstract]:In recent years, with the development and expansion of the mobile Internet, smart phones have also been rapidly developed. At present, Android system accounts for a large part of the global mobile operating system market and still has a rising trend. At the same time, Android has become the main platform for malware proliferation. Android malware has a variety of malware. To the users and even the whole society has brought huge harm and economic losses. Therefore, how to analyze and detect Android malware quickly and efficiently has become a hotspot. Firstly, the Android platform is summarized, and the system architecture and application program components of Android are analyzed. Then, the machine learning algorithm and the Spark parallel environment framework are analyzed, which lays the foundation for further research. Then, aiming at the defect that the voting principle of stochastic forest algorithm can not distinguish the difference between strong classifier and weak classifier, a weighted voting improvement method is proposed. On this basis, an improved stochastic forest classification model, (Improved Random Forest Classification Model,IRFCM). IRFCM, which is used to detect Android malware, is proposed to select Permission information and Intent information in AndroidManifest.xml files as feature attributes. The feature selection algorithm is used to optimize the set of feature vectors. Finally, the model is used to detect the final set of feature vectors. The experimental results show that IRFCM has better classification accuracy and efficiency. Finally, aiming at the problems of time-consuming and slow feature extraction in the decompilation process of application installation package under big data environment, combining IRFCM with Spark framework, Android malware detection in parallel environment is designed and implemented. The sample data is converted to the elastic distributed data set (Resilient Distributed Dataset,RDD) under the framework of Spark, and the feature extraction and classification detection of RDD are carried out in parallel in the virtual machine cluster environment. The experimental results in the parallel environment are compared with those in the single machine environment. The detection efficiency of Android malware is improved effectively.
【学位授予单位】：中国民航大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP316;TP309

【参考文献】