面向软件安全的二进制代码逆向分析关键技术研究

发布时间：2018-08-04 15:34

【摘要】：二进制代码逆向分析是一种针对二进制代码的程序分析技术。它在源代码无法获取的情形中至关重要。如在恶意软件检测与分析中,由于恶意软件作者往往不公开源代码,二进制代码逆向分析几乎是唯一的分析手段。在对商业软件的安全审查以及抄袭检测中,由于没有源代码,也只能对其二进制代码进行分析。二进制代码逆向分析技术还可以应用于加固现有软件,减少安全漏洞,也可以用于阻止软件被破解,防止软件被盗版,保护知识产权。当前,无论是在巨型计算机,还是在智能手机以及嵌入式设备中,绝大多数软件都是以二进制代码形式发布。所以,研究二进制代码逆向分析对于提高计算机软件的安全性,具有重要的科学理论意义和实际应用价值。由于二进制代码和源代码间存在巨大的差异,使得二进制代码逆向分析相对于程序源代码分析要困难得多。混淆技术的使用和编译器优化也会增加对二进制代码进行分析的难度。此外,为保护软件不被检测和分析,恶意软件会使用各种反分析方法,如基于完整性校验的反修改和基于计时攻击的反监控。为分析这些软件,需要对抗这些反分析。这又进一步增加了二进制代码逆向分析的难度。本文重点对二进制代码反分析的识别,反汇编,函数和库函数的识别等关键技术进行深入研究。针对当前反分析识别的研究都只针对特定类型的反分析设计特定的反分析识别方法、缺乏通用性的问题,分析了各种反分析方法之间的概念相似性,提出了一个基于信息流的反分析识别框架。针对当前对抗基于完整性校验的反修改需使用硬件辅助且不能处理自修改代码的问题,研究了一个无需硬件辅助的基于动态信息流的识别方法。首先使用后向污点分析来识别可执行内存位置或用来计算可执行位置值的内存位置,然后使用前向污点分析来识别校验过程。对于基于计时攻击的反监控,亦可以使用这种方法识别。首先将常见获取时间的指令和系统调用的返回值作为污点源,然后使用污点分析识别验证过程。本文方法可以成功识别现有研究文献中的基于完整性校验的反修改和基于计时攻击的反监控技术,并提供识别出的反分析的基础结构信息,进而可以帮助分析人员设计出对抗这些反分析的技术。针对当前动静结合反汇编方法仍存在覆盖率低的问题,研究了一种多路径探索方法来反汇编代码。静态反汇编无法区分可执行代码区域中的数据和代码,也无法处理自修改代码。动态反汇编方法代码覆盖率低,只会处理已执行的路径。本文使用基于二进制插桩技术的动态分析技术记录程序指令执行轨迹,并通过逆转已执行路径中的条件分支来实现多路径探索,从而提高动态分析的覆盖率。然后精简合并所有执行轨迹。最后使用静态反汇编来发现未处理区域中的代码。该方法能够高准确度高覆盖率地反汇编二进制代码。当前函数识别方法无法识别无交叉引用和无头尾特征的函数。针对这个问题,研究了一个以函数返回指令为识别特征的函数识别方法。因为一个函数至少有一个返回指令使得控制流离开函数,因此,相比传统方法使用的函数头尾特征,本文采用的返回指令作为识别特征更可靠。首先引入逆向扩展控制流图(Reverse Extended Control Flow Graph,RECFG)的概念。它是特定代码区域中,包含指定返回指令所有可能的控制流图集合。然后提出一种基于RECFG的函数识别方法,该方法首先从一个代码区域中的所有可解释为返回指令的地址开始逆向分析控制流图,构造RECFG。设计了4个图剪枝规则来移除非编译器正常生成的点和路径。然后对于每个独立的RECFG,最后使用多属性决策方法来挑选一个子图作为函数的控制流图。该方法可以准确地识别特定代码区域中可能的函数。针对传统库函数识别方法无法识别内联库函数的问题,研究了一个识别库函数的新方法。由于内联及优化的库函数存在非连续性和多态性,传统的基于函数头n个字节的特征匹配方法无法识别内联函数。本文首先引入执行流图(Execution Flow Graph,EFG)的概念,用EFG来描述二进制代码的内在行为特征。然后通过在目标函数中识别相似EFG子图来识别库函数。通过分析其各指令内执行依赖关系识别目标函数中非连续内联库函数。通过指令标准化识别经编译优化后存在的多形态内联库函数。由于子图同构测试非常耗时,因此本文定义了5个过滤器过滤掉不可能匹配的子图,并引入收缩执行流图(Reduced Execution Flow Graph,REFG)来加速子图同构测试。EFG和REFG方法的查准率都比当前最先进的工具高,并可以准确地识别传统方法难以识别的内联库函数。相对于EFG,REFG可以在保持相同查准率和查全率的情况下显著降低EFG方法的处理时间。综上所述,上述方法为识别包括基于完整性校验的反修改在内的反分析、提高动态反汇编方法覆盖率、识别无交叉引用无明显头尾特征的函数、快速识别库函数等关键技术问题提供了新思路和新方法。
[Abstract]:Binary code reverse analysis is a program analysis technique for binary code. It is critical in situations where source code is unavailable. For malware detection and analysis, as malware writers often do not expose source code, binary code reverse analysis is almost the only analytical means. The whole review and plagiarism test can only analyze its binary code because there is no source code. The binary code reverse analysis technology can also be used to reinforce existing software, reduce security vulnerabilities, prevent software from being cracked, prevent software from being pirated, and protect intellectual property. Most of the software is published in the form of binary code in smart phones and embedded devices. Therefore, it has important scientific theoretical significance and practical application value to study the reverse analysis of binary code to improve the security of computer software. There is a huge difference between the two code and the source code. It is much harder to analyze binary code reverse analysis relative to program source code analysis. Obfuscation technology and compiler optimization can also increase the difficulty of analyzing binary code. In addition, in order to protect software from detection and analysis, malware will use various anti analysis methods, such as reverse modification based on integrity check and based on integrity check. In order to analyze these software, to analyze these software, we need to fight against these anti analysis. This further increases the difficulty of reverse analysis of binary code. This paper focuses on the key technologies of binary code back analysis recognition, disassembly, function and library function identification. In view of the specific anti analysis design specific anti analysis recognition method, the problem of lack of generality is lack, the conceptual similarity between various anti analysis methods is analyzed, and an anti analysis recognition framework based on information flow is proposed. The problem of code is a method of identification based on dynamic information flow without hardware assistance. First, the back stain analysis is used to identify the executable memory location or the memory location used to calculate the executable position value, and then use the forward stain analysis to identify the checkout process. In this method, the common acquisition time instruction and the return value of the system call are used as the source of the stain, and then the verification process is identified using the stain analysis. This method can successfully identify the reverse modification based on the integrity check and the counter monitoring technology based on the timing attack in the existing research literature, and provide the identified counter points. Based on the analysis of the basic structure information, it can help the analyst to design the anti analysis technology. In view of the problem that the current static and static disassembly methods still have low coverage, a multi-path exploration method is studied to disassemble code. Static disassembler can not distinguish the data and code in the executable code area, nor can it be used. The dynamic disassembly method has low code coverage and only deals with the path that has been executed. This paper uses dynamic analysis technology based on binary piling technique to record program instructions to execute the trajectory, and realizes multi path exploration by reversing the conditional branch in the execution path, thus improving the coverage of dynamic analysis. After simplifying all execution trajectories. Finally, a static disassembly is used to find the code in the unprocessed area. This method can disassemble the binary code with high accuracy and high coverage. The current function recognition method can not identify functions without cross reference and head and tail features. In this case, a function return instruction is studied. A function recognition method for identifying features. Because a function has at least one return instruction to make the control flow out of the function, the return instruction used in this paper is more reliable compared to the feature of the function head and tail used in the traditional method. First, the reverse extended control flow graph (Reverse Extended Control Flow Graph, RECFG) is introduced. It is the concept of a specific code area that contains all possible control stream graphs of the specified return instruction. Then a RECFG based method of function recognition is proposed. This method begins with a reverse analysis and control flow graph from all the interpretable addresses in a code area as the address of the return instruction, and the construction of the RECFG. design 4 pruning rules. To remove the points and paths that the compiler generates normally. Then, for each independent RECFG, the multiple attribute decision method is used to select a subgraph as the control flow graph of the function. This method can accurately identify the possible functions in the specific code area. A new method of identifying library functions is studied. Due to the discontinuity and polymorphism of the library functions of inline and optimization, the traditional feature matching method based on the N byte of function head can not identify inline functions. Firstly, the concept of Execution Flow Graph (EFG) is introduced, and the inner line of binary code is described with EFG. It is characterized by identifying the library functions by identifying similar EFG subgraphs in the target function. 5 filters are defined to filter out subgraphs that can not be matched, and the Reduced Execution Flow Graph (REFG) is introduced to accelerate the precision of the.EFG and REFG methods of subgraph isomorphic testing, which are higher than the most advanced tools at present, and can accurately identify inline library functions that are difficult to identify by traditional methods. REFG can be compared to EFG. In the case of maintaining the same precision and recall rate, the processing time of the EFG method is significantly reduced. Above all, the above method is to identify the inverse analysis, including the inverse modification based on the integrity check, improve the coverage of the dynamic disassembly method, identify the function without cross reference, and quickly identify the key techniques, such as the library function. New ideas and new methods are provided for the problem of operation.
【学位授予单位】：哈尔滨工业大学
【学位级别】：博士
【学位授予年份】：2015
【分类号】：TP309

【相似文献】