面向软件安全的二进制代码逆向分析关键技术研究
[Abstract]:Binary code reverse analysis is a program analysis technique for binary code. It is critical in situations where source code is unavailable. For malware detection and analysis, as malware writers often do not expose source code, binary code reverse analysis is almost the only analytical means. The whole review and plagiarism test can only analyze its binary code because there is no source code. The binary code reverse analysis technology can also be used to reinforce existing software, reduce security vulnerabilities, prevent software from being cracked, prevent software from being pirated, and protect intellectual property. Most of the software is published in the form of binary code in smart phones and embedded devices. Therefore, it has important scientific theoretical significance and practical application value to study the reverse analysis of binary code to improve the security of computer software. There is a huge difference between the two code and the source code. It is much harder to analyze binary code reverse analysis relative to program source code analysis. Obfuscation technology and compiler optimization can also increase the difficulty of analyzing binary code. In addition, in order to protect software from detection and analysis, malware will use various anti analysis methods, such as reverse modification based on integrity check and based on integrity check. In order to analyze these software, to analyze these software, we need to fight against these anti analysis. This further increases the difficulty of reverse analysis of binary code. This paper focuses on the key technologies of binary code back analysis recognition, disassembly, function and library function identification. In view of the specific anti analysis design specific anti analysis recognition method, the problem of lack of generality is lack, the conceptual similarity between various anti analysis methods is analyzed, and an anti analysis recognition framework based on information flow is proposed. The problem of code is a method of identification based on dynamic information flow without hardware assistance. First, the back stain analysis is used to identify the executable memory location or the memory location used to calculate the executable position value, and then use the forward stain analysis to identify the checkout process. In this method, the common acquisition time instruction and the return value of the system call are used as the source of the stain, and then the verification process is identified using the stain analysis. This method can successfully identify the reverse modification based on the integrity check and the counter monitoring technology based on the timing attack in the existing research literature, and provide the identified counter points. Based on the analysis of the basic structure information, it can help the analyst to design the anti analysis technology. In view of the problem that the current static and static disassembly methods still have low coverage, a multi-path exploration method is studied to disassemble code. Static disassembler can not distinguish the data and code in the executable code area, nor can it be used. The dynamic disassembly method has low code coverage and only deals with the path that has been executed. This paper uses dynamic analysis technology based on binary piling technique to record program instructions to execute the trajectory, and realizes multi path exploration by reversing the conditional branch in the execution path, thus improving the coverage of dynamic analysis. After simplifying all execution trajectories. Finally, a static disassembly is used to find the code in the unprocessed area. This method can disassemble the binary code with high accuracy and high coverage. The current function recognition method can not identify functions without cross reference and head and tail features. In this case, a function return instruction is studied. A function recognition method for identifying features. Because a function has at least one return instruction to make the control flow out of the function, the return instruction used in this paper is more reliable compared to the feature of the function head and tail used in the traditional method. First, the reverse extended control flow graph (Reverse Extended Control Flow Graph, RECFG) is introduced. It is the concept of a specific code area that contains all possible control stream graphs of the specified return instruction. Then a RECFG based method of function recognition is proposed. This method begins with a reverse analysis and control flow graph from all the interpretable addresses in a code area as the address of the return instruction, and the construction of the RECFG. design 4 pruning rules. To remove the points and paths that the compiler generates normally. Then, for each independent RECFG, the multiple attribute decision method is used to select a subgraph as the control flow graph of the function. This method can accurately identify the possible functions in the specific code area. A new method of identifying library functions is studied. Due to the discontinuity and polymorphism of the library functions of inline and optimization, the traditional feature matching method based on the N byte of function head can not identify inline functions. Firstly, the concept of Execution Flow Graph (EFG) is introduced, and the inner line of binary code is described with EFG. It is characterized by identifying the library functions by identifying similar EFG subgraphs in the target function. 5 filters are defined to filter out subgraphs that can not be matched, and the Reduced Execution Flow Graph (REFG) is introduced to accelerate the precision of the.EFG and REFG methods of subgraph isomorphic testing, which are higher than the most advanced tools at present, and can accurately identify inline library functions that are difficult to identify by traditional methods. REFG can be compared to EFG. In the case of maintaining the same precision and recall rate, the processing time of the EFG method is significantly reduced. Above all, the above method is to identify the inverse analysis, including the inverse modification based on the integrity check, improve the coverage of the dynamic disassembly method, identify the function without cross reference, and quickly identify the key techniques, such as the library function. New ideas and new methods are provided for the problem of operation.
【学位授予单位】:哈尔滨工业大学
【学位级别】:博士
【学位授予年份】:2015
【分类号】:TP309
【相似文献】
相关期刊论文 前10条
1 王怀军;房鼎益;李光辉;张聪;姜河;;基于变形的二进制代码混淆技术研究[J];四川大学学报(工程科学版);2014年01期
2 高敏芬;王志;;二进制代码分析与反分析技术开放实验的探索[J];实验室科学;2011年03期
3 ;葛雷码——二进码转换[J];电子计算机动态;1961年12期
4 曾鸣;赵荣彩;姚京松;王小芹;;基于特征提取的二进制代码比较技术[J];计算机工程与应用;2006年22期
5 邓超国;谷大武;李卷孺;孙明;;一种基于全系统仿真和指令流分析的二进制代码分析方法[J];计算机应用研究;2011年04期
6 刘亮;彭帝;杨延峰;吴润浦;;二进制代码中整数型漏洞挖掘和利用技术[J];四川大学学报(工程科学版);2012年01期
7 曾鸣;赵荣彩;;二进制代码中函数混淆调用的识别[J];计算机工程与应用;2007年17期
8 赵钊;袁勇;车向前;何永君;元慧慧;;多种动态二进制代码插入框架的研究与分析[J];微计算机信息;2010年12期
9 姚伟平;王震宇;刘建林;窦增杰;;二进制代码覆盖率评估系统的设计与实现[J];计算机工程与设计;2010年24期
10 宋威;曾勇军;奚琪;;基于空间约束的二进制代码重写技术研究[J];计算机应用与软件;2014年06期
相关会议论文 前2条
1 李卷孺;谷大武;陆海宁;;二进制代码隐秘功能的安全性验证[A];全国计算机安全学术交流会论文集(第二十三卷)[C];2008年
2 王旭;范文庆;黄玮;;二进制代码混淆关键技术研究[A];2012年全国网络与数字内容安全学术年会论文集[C];2012年
相关博士学位论文 前2条
1 邱景;面向软件安全的二进制代码逆向分析关键技术研究[D];哈尔滨工业大学;2015年
2 王志;二进制代码路径混淆技术研究[D];南开大学;2012年
相关硕士学位论文 前6条
1 毕涵诚;二进制代码匹配与分析系统的设计与实现[D];山东大学;2016年
2 李朝君;二进制代码安全性分析[D];中国科学技术大学;2010年
3 白莉莉;多源二进制代码一体化翻译关键技术研究[D];解放军信息工程大学;2010年
4 王为尉;基于混合执行的二进制代码测试系统的设计与实现[D];电子科技大学;2012年
5 陈晓斌;基于二进制代码等价变换的代码伪装技术研究[D];解放军信息工程大学;2009年
6 黎超;基于切片的二进制代码可视化分析的研究[D];广东工业大学;2011年
,本文编号:2164328
本文链接:https://www.wllwen.com/falvlunwen/zhishichanquanfa/2164328.html