一种基于抽象语法树的同源性比对的优化算法
发布时间:2018-11-03 06:57
【摘要】:在互联网高速发展的今天,计算机软件越来越多的融入了我们的生活,开源软件源代码的不断增长也是这个行业快速发展的原动力之一,但是在这样的背景下,也产生了源代码抄袭、侵犯软件著作权等一系列问题,这时我们看到软件同源性比对技术的研究是十分必要的。目前比较主流的软件同源性比对技术主要有基于文本的、基于Token的、基于图的等等,基于文本和Token的比对方法没有考虑到源代码的结构,使用起来比较局限,而基于图的比对方法在图的生成方面又显得相对复杂,本文基于源代码的抽象语法树,提出了一种考虑到代码结构但又相对简单的同源性比对算法。本文首先对同源性比对技术进行了简单的介绍,随后对基于抽象语法树的同源性比对算法进行了比较细致的分析。接着提出了本文的索引存储结构,通过将比对过程转换成两级来减少比对次数,并使用实际数据说明了这种改进的可行性以及对算法比对次数的有效减少。考虑到目前计算机的CPU都是多核的,本文又提出了一种多线程的优化方案,从而实现对CPU的充分利用,进而实现对同源性比对效率的提升,通过这两种优化方案,本文实现了对基于抽象语法树的同源性比对算法的优化,随后的章节设计实现了优化算法的各个功能模块。最后通过将算法的工程实现与比较出名的同类软件进行了一些准确率的比对测试,表明算法的可靠性,然后对多线程的优化方案也进行了实际的效果验证。紧接着,本文对后续的研究做出了一些建设性的展望与分析。综上所述,本文对同源性比对方面的相关研究有一定的帮助。
[Abstract]:With the rapid development of the Internet, computer software is more and more integrated into our life. The increasing source code of open source software is also one of the driving forces of the rapid development of the industry, but in this context, It also produces a series of problems, such as plagiarism of source code, infringement of software copyright and so on. At this time, we see that the research of software homology comparison technology is very necessary. At present, the mainstream software homology comparison techniques are mainly text-based, Token based, graph-based and so on. The comparison method based on text and Token does not take into account the structure of source code, so its use is relatively limited. In this paper, based on the abstract syntax tree of the source code, a simple homology matching algorithm considering the code structure is proposed. In this paper, the homology alignment technology is introduced briefly, and then the algorithm based on abstract syntax tree is analyzed in detail. Then the index storage structure of this paper is proposed to reduce the number of times of comparison by converting the comparison process into two levels. The feasibility of the improvement and the effective reduction of the times of comparison of the algorithm are illustrated by using the actual data. Considering that the current computer CPU is multi-core, this paper proposes a multi-thread optimization scheme, which can make full use of CPU and improve the efficiency of homology comparison. In this paper, the algorithm of homology alignment based on abstract syntax tree is optimized, and each functional module of the optimization algorithm is designed and implemented in the following chapters. Finally, the engineering implementation of the algorithm is compared with the well-known software of the same kind of software, which shows the reliability of the algorithm, and then the multi-thread optimization scheme is verified. Then, this paper makes some constructive outlook and analysis on the follow-up research. To sum up, this paper is helpful to the research of homology comparison.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.5
本文编号:2307039
[Abstract]:With the rapid development of the Internet, computer software is more and more integrated into our life. The increasing source code of open source software is also one of the driving forces of the rapid development of the industry, but in this context, It also produces a series of problems, such as plagiarism of source code, infringement of software copyright and so on. At this time, we see that the research of software homology comparison technology is very necessary. At present, the mainstream software homology comparison techniques are mainly text-based, Token based, graph-based and so on. The comparison method based on text and Token does not take into account the structure of source code, so its use is relatively limited. In this paper, based on the abstract syntax tree of the source code, a simple homology matching algorithm considering the code structure is proposed. In this paper, the homology alignment technology is introduced briefly, and then the algorithm based on abstract syntax tree is analyzed in detail. Then the index storage structure of this paper is proposed to reduce the number of times of comparison by converting the comparison process into two levels. The feasibility of the improvement and the effective reduction of the times of comparison of the algorithm are illustrated by using the actual data. Considering that the current computer CPU is multi-core, this paper proposes a multi-thread optimization scheme, which can make full use of CPU and improve the efficiency of homology comparison. In this paper, the algorithm of homology alignment based on abstract syntax tree is optimized, and each functional module of the optimization algorithm is designed and implemented in the following chapters. Finally, the engineering implementation of the algorithm is compared with the well-known software of the same kind of software, which shows the reliability of the algorithm, and then the multi-thread optimization scheme is verified. Then, this paper makes some constructive outlook and analysis on the follow-up research. To sum up, this paper is helpful to the research of homology comparison.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP311.5
【参考文献】
相关期刊论文 前4条
1 熊浩;晏海华;赫建营;赵长海;;一种基于静态词法树的程序相似性检测方法[J];计算机应用研究;2009年04期
2 张鹏;王国胤;陶春梅;罗海;;基于本体粗糙集的程序代码相似度度量方法[J];重庆邮电大学学报(自然科学版);2008年06期
3 赵长海;晏海华;金茂忠;;基于编译优化和反汇编的程序相似性检测方法[J];北京航空航天大学学报;2008年06期
4 张文典,任冬伟;程序抄袭判定系统[J];小型微型计算机系统;1988年10期
相关硕士学位论文 前1条
1 王春晖;程序代码抄袭检测中串匹配算法的研究与实现[D];内蒙古师范大学;2008年
,本文编号:2307039
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2307039.html