PE病毒文件聚类技术研究与实现
[Abstract]:At present, the Internet has become an indispensable part of people's lives. However, the Internet brings convenience to people's life at the same time, Internet security is like a sword hanging on the head, which may cause great harm to social life at any time. Because the Windows operating system is still the mainstream operating system, so the scope of PE virus is also the most extensive. And every year the number of new viruses increased dramatically, security manufacturers are overwhelmed. Therefore, it is of great practical significance to study the automatic clustering of PE virus files according to their families. In order to solve the problem of extracting static features from PE virus files without considering their n-gram temporal features, this paper proposes an algorithm for extracting temporal features of PE virus files based on the analysis of Word2vec principle. This paper studies the structure of PE files and the principle of clustering algorithm, designs the PE virus file clustering system, and verifies the algorithm proposed in this paper. The main contents and achievements of this paper are as follows: (1) after analyzing the problem that the temporal features of PE virus files are not considered, an algorithm for extracting temporal features of PE virus files is proposed. At present, the research on extracting static features of PE files focuses on the use of information gain to select n-gram features and extract API function calls, string information, etc. Therefore, based on the detailed analysis of the structure of PE files, a timing feature extraction algorithm is proposed. (2) A timing feature extraction algorithm for PE virus files is designed and implemented. In this paper, Word2vec is used to convert n-gram words in PE files into word vectors, and word vectors are then used as the basis for measuring the similarity between words and words. By using K-means algorithm, the words with similar context and semantics are divided into a class. In order to reduce the dimension of temporal feature vector. (3) the PE virus file clustering system is designed and implemented. The system consists of two parts. The first part is to verify the validity of temporal features, and the SGD multi-classification algorithm is used. The second part is to apply the temporal features to the clustering of PE virus files. The clustering effects of K-means and peak density algorithm are compared. (4) A PE virus file clustering system is proposed in this paper. A group of virus samples are used to test the PE virus file clustering system designed in this paper. The test results show that the system achieves the expected clustering effect and the timing feature extraction algorithm is practical.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP393.08;TP311.13
【参考文献】
相关期刊论文 前8条
1 韩兰胜;高昆仑;赵保华;赵东艳;王于波;金文德;;基于API函数及其参数相结合的恶意软件行为检测[J];计算机应用研究;2013年11期
2 赵跃华;林聚伟;;面向海量病毒样本家族聚类方法的研究[J];计算机工程与应用;2014年18期
3 王蕊;冯登国;杨轶;苏璞睿;;基于语义的恶意代码行为特征提取及检测方法[J];软件学报;2012年02期
4 王维;张鹏涛;谭营;何新贵;;一种基于人工免疫和代码相关性的计算机病毒特征提取方法[J];计算机学报;2011年02期
5 左黎明;刘二根;徐保根;汤鹏志;;恶意代码族群特征提取与分析技术[J];华中科技大学学报(自然科学版);2010年04期
6 樊震;杨秋翔;;基于PE文件结构异常的未知病毒检测[J];计算机技术与发展;2009年10期
7 王成;庞建民;赵荣彩;王强;;基于可疑行为识别的PE病毒检测方法[J];计算机工程;2009年15期
8 陈学进;;数据挖掘中聚类分析的研究[J];计算机技术与发展;2006年09期
相关博士学位论文 前2条
1 唐东明;聚类分析及其应用研究[D];电子科技大学;2010年
2 赵恒;数据挖掘中聚类若干问题研究[D];西安电子科技大学;2005年
相关硕士学位论文 前7条
1 刘旭;恶意代码的检测技术研究[D];吉林大学;2014年
2 屈亚鑫;反木马系统中程序行为分析关键技术研究与实现[D];北京邮电大学;2014年
3 雷迟骏;基于启发式算法的恶意代码检测系统研究与实现[D];南京邮电大学;2012年
4 邹梦松;计算机病毒行为检测方法研究[D];华中科技大学;2011年
5 洪群业;基于分类的未知PE病毒检测技术的研究[D];重庆大学;2010年
6 吴晓丹;反病毒虚拟机关键技术研究[D];中国科学技术大学;2009年
7 周昭涛;文本聚类分析效果评价及文本表示研究[D];中国科学院研究生院(计算技术研究所);2005年
,本文编号:2378291
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2378291.html