缓冲区溢出漏洞精准检测技术研究

发布时间：2019-05-23 21:39

【摘要】：在信息技术高速发展的今天,软件已经成为计算机系统的主要使能部件。但同时,软件由于自身存在的漏洞被利用于攻击,造成严重后果的事件也层出不穷。这对软件安全提出了新的挑战,软件安全问题成为一个越来越重要的问题。缓冲区溢出漏洞正是一类常被用于攻击的漏洞,已经成为一类极度危险的软件漏洞。在程序运行时,如果向程序的缓冲区当中写入超过其长度的数据,就会发生缓冲区溢出。这些溢出的数据有可能破坏程序的堆栈,从而导致程序崩溃,甚至是执行攻击者的指令。缓冲区溢出发生的根本原因在于程序员没有对程序中的缓冲区操作进行必要的边界检查。目前检测缓冲区溢出漏洞的方法主要分为两大类:静态分析和动态测试。动态测试的弊端在于非常依赖测试用例,且常常需要额外的执行开销。而静态分析技术则可以在软件部署前自动化发现漏洞,因此被工业界广泛的采用。但是由于它无法获取程序运行时的缓冲区状态,又采取了保守的策略,静态分析技术通常会有大量的误报。这些误报中有一部分是因为未能识别程序员主动采取的保护缓冲区的安全措施而产生。本文针对这类误报,研究精准静态分析与误报识别方法,论文具体工作包括:1.本文提出了缓冲区溢出漏洞模式。通过对C/C++程序缓冲区访问操作以及真实项目的实例研究,本文建立了包含导致溢出的缓冲区操作分布、缓冲区溢出漏洞产生机理、缓冲区溢出人工修复模式等的缓冲区溢出漏洞模式。2.本文提出了一种识别主动安全措施的缓冲区溢出静态分析方法。该方法基于缓冲区溢出漏洞模式,在静态分析过程中,加入了对代码中的预防缓冲区溢出发生的主动安全手段的检测,从而减少了因为未能识别程序员主动安全手段导致的误报,使得检测结果误报更少、更加精准。基于该方法本文开发了工具BoChecker,并在100个真实案例上进行了实验。实验结果显示,其漏报率45.00%和误报率29.1%都要低于对比的其他工具。3.本文提出了一种基于机器学习的静态分析警报处理方法。该方法利用缓冲区溢出漏洞模式和静态分析警报抽取特征,利用随机森林来构建模型。生成的模型可对静态分析警报是否为误报作出预测。基于该方法本文开发了工具BoWFilter,并在545个Checkmarx的警报上进行了实验。实验结果显示,对于误报和非误报该工具都有非常高的预测准确率,分别达到了92.9%和 88.5%。
[Abstract]:With the rapid development of information technology, software has become the main enabling component of computer system. But at the same time, the software is exploited because of its own loopholes, and the events that cause serious consequences emerge in endlessly. This poses a new challenge to software security, and software security has become a more and more important issue. Buffer overflow vulnerability is a kind of vulnerability that is often used to attack, and has become a kind of extremely dangerous software vulnerability. When the program runs, a buffer overflow occurs if more than its length is written to the program's buffer. This overflow data may break the stack of the program, causing the program to crash or even execute the instructions of the attacker. The fundamental reason for buffer overflow is that the programmer does not perform the necessary boundary checks on buffer operations in the program. At present, the methods to detect buffer overflow vulnerabilities are mainly divided into two categories: static analysis and dynamic testing. The downside of dynamic testing is that it relies heavily on test cases and often requires additional execution overhead. Static analysis technology can automate the discovery of vulnerabilities before software deployment, so it is widely used in industry. However, because it can not obtain the buffer state of the program when it runs, and adopts a conservative strategy, static analysis technology usually has a large number of false positives. Some of these false positives are due to the failure to identify the security measures taken by the programmer to protect the buffer. Aiming at this kind of false positives, this paper studies the accurate static analysis and false positives recognition methods. The specific work of this paper includes: 1. In this paper, a buffer overflow vulnerability pattern is proposed. Through the study of buffer access operation of C / C program and the case study of real project, this paper establishes the distribution of buffer operation that leads to overflow and the mechanism of buffer overflow vulnerability. Buffer overflow vulnerability mode for buffer overflow manual repair mode, etc. 2. In this paper, a static analysis method of buffer overflow is proposed to identify active security measures. This method is based on buffer overflow vulnerability mode, and in the process of static analysis, it adds the detection of active security means to prevent buffer overflow from happening in the code. Thus, the false positives caused by the failure to identify the programmer's active security means are reduced, and the false positives of the detection results are less and more accurate. Based on this method, the tool BoChecker, is developed and experimented on 100 real cases. The experimental results show that the false positive rate of 45.00% and the false positive rate of 29.1% are lower than those of other tools. In this paper, a static analysis alarm processing method based on machine learning is proposed. In this method, buffer overflow vulnerability pattern and static analysis alarm extraction feature are used, and random forest is used to construct the model. The generated model can predict whether the static analysis alarm is false. Based on this method, a tool BoWFilter, is developed and tested on 545 Checkmarx alerts. The experimental results show that the prediction accuracy of the tool is 92.9% and 88.5%, respectively, for both mispositives and non-mispositives.
【学位授予单位】：南京大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP309

【参考文献】