基于岭回归构建程序实体怀疑度的故障定位方法研究

发布时间：2018-10-23 17:19

【摘要】：定位软件中存在的故障或者说找到软件中错误语句所在的行数是软件开发周期中最耗时的部分,因此研究自动化故障定位技术或者辅助故障定位技术都是非常有必要的。现有的故障定位技术中以基于程序谱的技术在实际作用中效率最高。然而该技术存在着比较明显的缺陷:(1)对于所有程序,采用固定的怀疑度公式计算程序实体的怀疑度,然而对于特定的程序,受到测试用例以及开发语言等各种客观因素影响,可能出错方式与其他程序是不同的。(2)出现故障的语句往往会在程序运行的过程中影响后面的语句,但该技术没有考虑程序实体之间的这种关联,因此无法正确的评估某些语句的怀疑度值,进而导致定位效果不佳。针对以上问题本文提出采用核岭回归对程序实体与测试用例运行结果之间的关系构建学习模型,岭回归方法是一种能够解决特征之间存在复共线性关系的机器学习方法,因此可以对每一个程序构建自己独特的程序实体怀疑度计算过程,不失一般性本文将语句作为程序实体,将语句在执行时的覆盖次数作为特征,将每一个测试用例的运行结果作为目标值,使用每一次测试用例运行时程序的语句覆盖情况以及目标结果构建训练矩阵,最终通过使用核岭回归进行样本训练得到的拟合曲线以及虚拟向量来评估每一项程序实体包含故障的可能性。由于岭回归方法中目标结果为二值的情况并不能完全适应于软件故障定位的要求,因此本文还提出了对测试用例运行结果的平滑过程,以优化拟合曲线的构建。最后,本文设计了详细的实验框架,建立了自动化的故障定位流程,按照现有故障定位方法的评估标准使用通用测试集验证了基于岭回归构建实体怀疑度来评估程序实体包含故障可能性大小方法的有效性。
[Abstract]:It is the most time-consuming part of the software development cycle to locate the faults in the software or to find the number of lines where the wrong statements are located, so it is necessary to study the automatic fault location technology or the auxiliary fault location technology. Among the existing fault location techniques, the one based on program spectrum is the most efficient in practice. However, the technique has some obvious defects: (1) for all programs, a fixed degree of doubt formula is used to calculate the degree of doubt of a program entity. However, for a particular program, it is affected by various objective factors, such as test cases and development languages, etc. Possible errors are different from other programs. (2) statements that fail tend to affect subsequent statements while the program is running, but the technique does not take into account this association between program entities, Therefore, the doubt value of some statements can not be correctly evaluated, which leads to poor positioning effect. In order to solve the above problems, this paper proposes a learning model based on kernel ridge regression, which can solve the complex collinear relationship between features and the relationship between program entities and test case results. Therefore, each program can be constructed its own unique program entity doubt calculation process, without losing the general statement as a program entity, the statement in the execution of the coverage of the number of times as a feature, The result of each test case is taken as the target value, and the training matrix is constructed using the statement coverage of the program and the target result of each test case run. Finally, the probability of each program entity containing the fault is evaluated by using the fitting curve and the virtual vector obtained by using the kernel ridge regression to train the samples. Because the target result is binary in the ridge regression method, it can not meet the requirement of software fault location. Therefore, the smooth process of test case running results is proposed in this paper to optimize the construction of fitting curve. Finally, the detailed experimental framework is designed, and the automatic fault location flow is established. According to the evaluation criteria of the existing fault location methods, the validity of the method for evaluating the probability of the program entities containing faults is verified by using the universal test set to construct the degree of entity suspicion based on the ridge regression.
【学位授予单位】：浙江大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP311.5

【参考文献】