PDF文档的安全性检测技术研究

发布时间：2018-10-19 19:58

【摘要】：近年来,PDF文档格式作为一种电子文件的常见格式,已经得到了广泛使用。自2008年Adobe Reader被发现出第一例关键漏洞(CVE-2008-2549)以来,越来越多的PDF文件已经成为攻击的重要手段。但与其他的JavaScript的攻击方式相比,基于PDF的攻击在研究中并没有引起大量的关注,在此背景下,有必要开展对PDF文档的安全性检测的研究。本文首先介绍了开展PDF文档安全性研究的背景和发展现状,从纯静态检测、纯动态检测和动静结合的检测模式三方面对目前的研究状况进行了介绍和分析。接着介绍了PDF文档的格式和PDF文档的安全性问题,对每个部分的构成进行了详细的阐述和介绍。在PDF文档的安全性上,对PDF文档中的JavaScript模块进行了展开分析,此部分是PDF文档的安全性问题的基础和重点。在静态检测方面,本文介绍了静态检测PDF文档安全性的原理和对静态检测方案进行了改进和实现。首先是从PDF文档中对JavaScript代码进行提取,通过在提取过程中加入一定的反混淆(deobfuscate)措施,能够从PDF文档中正确的提取出相应的JavaScript代码,使特征分析更为准确。结合PDF文档的安全性问题的特殊性,设计了单一类别支持向量机的衍生模型,建立了更为完善的机器学习模型,通过子模型的加入,能够对恶意的PDF文档的攻击模式进行分类。与传统的方案相比,这种静态检测方案提高了静态检测的准确度,并能够提供更多的有效信息。在动态检测方面,本文对动态检测PDF文档的安全性的原理进行了介绍并建立了完整的动态检测系统。首先利用shellcode的模拟器libemu对能够提取出shellcode的PDF文档进行直接检测,对其他类型的文档,则通过沙盒机制,利用Cuckoo Sandbox进行详细的行为分析。由于对静态检测结果的充分使用并且加入了模拟器等机制,与单纯的使用沙盒对PDF文档的安全性进行动态检测相比,既能够充分的利用动态检测具有较高的准确率的优点并且能够减少检测时间,提高检测效率。最后本文对整个PDF文档的安全性检测系统进行了介绍和实现,并且利用从网络中收集到的PDF文档的样本对整个系统进行了测试。从实验结果可以看出,整个系统充分的利用了PDF文档安全性问题的特征,能够准确快速的为PDF文档的安全性进行检测和分析。
[Abstract]:In recent years, PDF document format, as a common format of electronic documents, has been widely used. Since the first critical vulnerability (CVE-2008-2549) was discovered in Adobe Reader in 2008, more and more PDF files have become an important means of attack. However, compared with other JavaScript attacks, PDF based attacks have not attracted much attention in the research. In this context, it is necessary to carry out research on the security detection of PDF documents. This paper first introduces the background and development of PDF document security research, and introduces and analyzes the current research status from three aspects: pure static detection, pure dynamic detection and dynamic detection mode. Then the paper introduces the format of PDF document and the security of PDF document, and describes the composition of each part in detail. In the aspect of PDF document security, the JavaScript module in PDF document is analyzed. This part is the foundation and emphasis of PDF document security problem. In the aspect of static detection, this paper introduces the principle of static detection PDF document security and the improvement and implementation of static detection scheme. Firstly, the JavaScript code is extracted from the PDF document. By adding some anti-obfuscation (deobfuscate) measures in the extraction process, the corresponding JavaScript code can be extracted correctly from the PDF document, so that the feature analysis is more accurate. Considering the particularity of the security problem of PDF document, the derivative model of single class support vector machine is designed, and a more perfect machine learning model is established. By adding the sub-model, the attack pattern of malicious PDF document can be classified. Compared with the traditional scheme, the static detection scheme can improve the accuracy of static detection and provide more effective information. In the aspect of dynamic detection, this paper introduces the principle of dynamic detection of PDF document and establishes a complete dynamic detection system. Firstly, the shellcode simulator libemu is used to directly detect the PDF documents that can extract shellcode, and for other types of documents, the detailed behavior analysis is carried out through sandboxie mechanism and Cuckoo Sandbox. Due to the full use of static detection results and the addition of simulator mechanisms, compared with using sandboxie to dynamically detect the security of PDF documents, It can make full use of the high accuracy of dynamic detection, reduce the detection time and improve the detection efficiency. Finally, this paper introduces and implements the security detection system of the whole PDF document, and tests the whole system by using the samples of the PDF documents collected from the network. It can be seen from the experimental results that the whole system makes full use of the security characteristics of PDF documents and can accurately and quickly detect and analyze the security of PDF documents.
【学位授予单位】：上海交通大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TP309

【参考文献】