基于监督学习的bug报告和源代码摘要
[Abstract]:When performing software tasks, developers need to interact with software artifacts such as bug report, source code warehouse and so on. In order to obtain the required information, they may need to read through the whole artifact thoroughly. However, extracting valuable information from bug reports and source code is a tedious and time-consuming task. In order to solve this task efficiently, the researchers suggest that summary information be established automatically for software artifacts. In this paper, in order to facilitate developers to extract the required information efficiently from bug reports and source code warehouses, we propose to use supervised learning technology to establish summary information. We use duplicate bug reports to create bug report summary information as an example of a natural language text summary task. In another study, we performed a source code fragment summary as an example of the source code to source code summary task. For bug report, we develop a bug report summary algorithm based on PageRank, which is called PRST. for short. In this algorithm, three different similarity measures are used to calculate the similarity between the main bug report and the corresponding repeated bug report based on VSM.Jaccard and WordNet, respectively. Due to the lack of the corresponding relationship between the main bug report and the repeated bug report in the publicly available bug report corpus, it is impossible to use the information contained in the duplicate bug report to perform the bug report summary task. Therefore, we extracted 59 bug reports from the Mozilla,KDE,Gnome and Eclipse projects and established a separate bug report corpus called OSCAR. At the same time, we reconstruct the existing BRC corpus by adding repeated bug reports and use it as a comparative corpus. We use several advanced statistical evaluation indexes, namely precision (Precision), recall (Recall), F-Score and Pyramid Precision, to evaluate the effectiveness of the proposed algorithm. The results show that the proposed algorithm can obtain relatively accurate summary information of bug report, and improve the existing supervised bug report and accuracy. Similarly, in order to establish source code summary information, we develop a code fragment summary algorithm based on SVM and NB classifiers (CodeFragment Summarization,CFS) to automatically generate source-to-source summary information in source code fragments. In the software artifact summary paradigm, we first introduce a data-driven small-scale crowdsourcing method to help us extract the syntactic features of the source code. We retrieve 127 code fragments from Eclipse and NetBeans official FAQs and build a code fragment corpus for testing. We also use the statistical evaluation indicators mentioned earlier and compare the existing methods to verify the effectiveness of our proposed method. The results show that our code fragment extractor is more accurate than the existing code fragment summary generation methods, and syntactic features have an important impact on the accuracy of the generated summary information. The generated summary information can effectively help developers solve the software tasks in hand, and effectively improve the performance and quality of the software.
【学位授予单位】:大连理工大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP311.5;TP391.1
【相似文献】
相关期刊论文 前10条
1 程菲;汪建海;罗键;;基于重复检测的多摘要消重方法[J];计算机工程与设计;2006年23期
2 于建原;使用Word 97“自动编写摘要”功能[J];电脑爱好者;1998年04期
3 龚笔宏;SCC——利用分类技术改进的短摘要比较方法[J];清华大学学报(自然科学版);2005年S1期
4 赵斌;吉根林;曲维光;顾彦慧;;基于转发图的微博事件主题摘要方法[J];南京师大学报(自然科学版);2014年01期
5 任昭春;马军;陈竹敏;;基于动态主题建模的Web论坛文档摘要[J];计算机研究与发展;2012年11期
6 易荣锋;朱六璋;尹文科;;互联网视频摘要信息自动抽取[J];计算机系统应用;2010年10期
7 张雅奇;张定会;江平;;一种提高QR码安全性的方法[J];信息技术;2012年11期
8 王群;刘群;向明辉;吴渝;;基于局部-空间模型的视频摘要研究与设计[J];计算机工程;2011年02期
9 宋人杰;曹振丽;顾宁;;站内搜索系统动态摘要算法的研究[J];计算机工程与设计;2008年05期
10 ;诊所[J];数字通信;2005年21期
相关博士学位论文 前2条
1 金锋;文档摘要算法的研究与应用[D];清华大学;2011年
2 严睿;演进式动态新闻文档摘要生成方法研究[D];北京大学;2013年
相关硕士学位论文 前10条
1 郭海蓉;增量聚类在动态多文档摘要中的研究与应用[D];西南科技大学;2015年
2 郝辉辉;基于词向量和主题模型的生物医学摘要技术[D];大连理工大学;2015年
3 兰怡洁;基于情感的视频摘要研究[D];北京交通大学;2017年
4 李辉;基于语义关系的摘要提取[D];南京理工大学;2004年
5 向文韬;DTS中路径敏感的摘要技术研究与实现[D];北京邮电大学;2013年
6 安文佳;基于阅读行为的图书章节摘要生成研究[D];浙江大学;2014年
7 季知祥;电子商务中针对产品的摘要挖掘技术研究[D];哈尔滨工业大学;2011年
8 刘红艳;基于hLDA层次主题模型的多文档摘要技术研究[D];北京邮电大学;2012年
9 胡成杰;Java语言基于函数摘要的过程间静态测试方法研究[D];北京邮电大学;2011年
10 唐向阳;基于简化MD5摘要技术快照差分算法的研究[D];暨南大学;2011年
,本文编号:2494252
本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/2494252.html