基于控制混淆和布局混淆的代码混淆系统

发布时间：2018-03-19 19:54

本文选题：代码混淆　切入点：布局混淆　出处：《中国科学技术大学》2017年硕士论文　论文类型：学位论文

【摘要】：随着互联网技术和计算机软件产业的快速发展,软件的获取和使用变的更加容易。现阶段,大量的付费软件被攻击者破解,并发布到网络上供人们免费使用。而软件被破解后,攻击者可能会窃取软件中的核心代码或重要信息,使软件的知识产权很难得到保障,限制了软件产业的快速发展。近年来,由于Python语言的语法简洁清晰,拥有功能强大而丰富的类库,可以大幅提高软件开发人员的开发效率,越来越多的代码开始使用Python语言来编写,而Python的字节码很容易被现有的反编译工具破解,导致其中的关键信息被窃取。为了解决上述问题,在调研了现有的代码混淆技术的基础上,本文基于布局混淆和控制混淆设计实现了一个Python代码混淆系统,该混淆系统可以增加攻击者破解Python代码的难度。主要工作如下:(1)对代码混淆技术进行分析。首先,介绍代码混淆中包含的布局混淆、数据混淆、预防性混淆、控制混淆,并给出了每种混淆下面包括的具体的混淆方法。然后,描述了代码混淆的四个评价标准,后面将使用这四个标准来对本文设计的Python代码混淆系统进行评价。(2)分析本文采用的控制混淆中需要用到的不透明谓词和压扁控制流算法。首先,分析了 Logistic和En_Logistic两个混沌映射中存在的问题,给出了本文设计的代码混淆系统中将会使用的分段Logistic混沌映射,它克服了前两个映射中存在的对初始值不够敏感和容易产生小区域聚集现象的缺点。然后,介绍了不透明谓词的定义,给出了一种使用混沌映射构造混沌不透明谓词的方法。最后,分析了现有的压扁控制流算法中存在的缺陷。(3)实现本文设计的Python代码混淆系统。首先,介绍了该系统中使用到的布局混淆。其次,提出并实现了该系统中需要用到的基于同余方程构造的不透明谓词算法和基于分段Logistic混沌映射构造的N态不透明谓词算法,并分析了如何在代码中插入基于同余方程构造的不透明谓词。然后,提出了一种改进的压扁控制流算法,该算法克服了现有的压扁控制流算法中存在的缺陷。最后,在该代码混淆系统中实现了控制混淆和布局混淆的算法。(4)对本文设计的Python代码混淆系统进行实验与分析。使用开源的测试用例在提出的实验平台上对本文设计的代码混淆系统进行了实验,并在正确性、安全性、有效性方面对其进行了分析。实验结果表明,本文使用布局混淆和控制混淆算法设计的Python代码混淆系统能够保证混淆后代码的正确性,并且与现有的不透明谓词生成算法相比具有更好的均衡性,计算不透明谓词输出时带来的额外时间开销更低,同时具有很高的安全性、强度、耐受力、隐蔽性,能够有效地增加攻击者破解代码的难度,同时也存在一定的时间开销和空间开销。
[Abstract]:With the rapid development of Internet technology and computer software industry, the acquisition and use of software becomes easier. At this stage, a large number of paid software has been cracked by attackers and released to the network for free. Attackers can steal core code or important information from software, making it difficult to secure intellectual property rights and limiting the rapid growth of the software industry. Having powerful and rich class libraries can greatly improve the development efficiency of software developers. More and more code is being written in the Python language, and the bytecode of Python can be easily cracked by existing decompilation tools. In order to solve the above problems, this paper studies the existing code confusion technology, and implements a Python code confusion system based on layout confusion and control confusion design. This confusion system can make it more difficult for an attacker to crack Python code. The main work is as follows: 1) analyze the code confusion technology. First, introduce the layout confusion, data confusion, prevention confusion, control confusion, which are included in code confusion. Then, the four evaluation criteria of code confusion are described. The following four criteria will be used to evaluate the Python code confusion system designed in this paper.) the opaque predicates and flattening control flow algorithms used in this paper will be analyzed. The problems in Logistic and En_Logistic chaotic maps are analyzed, and the piecewise Logistic chaotic maps that will be used in the code confusion system designed in this paper are given. It overcomes the shortcomings of the first two mappings, which are insensitive to initial values and easy to produce small region aggregation. Then, the definition of opaque predicates is introduced. A method of constructing chaotic opaque predicates using chaotic mapping is presented. Finally, the defects in the existing flattening control flow algorithms are analyzed to implement the Python code confusion system designed in this paper. The layout confusion used in the system is introduced. Secondly, the opaque predicate algorithm based on congruence equation and the N-state opaque predicate algorithm based on piecewise Logistic chaotic mapping are proposed and implemented. Then, an improved flattening control flow algorithm is proposed, which overcomes the defects of the existing flattening control flow algorithms. The algorithm of controlling confusion and layout confusion is implemented in the code confusion system. The experiment and analysis of the Python code confusion system designed in this paper are carried out. The code confusion system is experimented with, The experimental results show that the Python code confusion system designed by using layout confusion and control obfuscation algorithm can ensure the correctness of the obfuscation code. Compared with the existing algorithms, it has better balance, lower extra time cost when calculating the output of opaque predicates, and has high security, strength, endurance and concealment. It can effectively increase the difficulty of the attacker to break the code, at the same time, there is a certain amount of time and space overhead.
【学位授予单位】：中国科学技术大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.5;TP309

【参考文献】