基于朴素贝叶斯和EM算法的软件工作量缺失数据处理方法
发布时间:2019-05-27 10:39
【摘要】:软件工作量数据缺失是软件工作估算的一个难题.本文在朴素贝叶斯模型和EM(expectation maximization)算法的基础上,提出了针对软件工作量数据缺失的处理方法.首先,本文介绍了数据缺失的内在机理;然后,本文提出了基于朴素贝叶斯和EM算法的软件工作量预测的方法;之后,本文提出了两种软件工作量数据缺失处理方法:容忍数据缺失方法和修复缺失数据方法;最后,本文利用ISBSG和CSBSG软件工作量数据集验证了本文所提出的数据缺失处理方法.实验结果表明:本文所提出的两种缺失数据处理方法的性能要优于MINI修复方法结合SVM的分类模型;修复缺失数据方法的预测性能要优于容忍数据缺失方法;对于两个数据集来说,其每一类别的工作量数据都来自于一个高斯组件.
[Abstract]:The lack of software workload data is a difficult problem in software work estimation. In this paper, based on naive Bays model and EM (expectation maximization) algorithm, a method to deal with the lack of software workload data is proposed. First of all, this paper introduces the internal mechanism of data loss, and then, this paper proposes a software workload prediction method based on naive Bays and EM algorithm. After that, this paper puts forward two methods to deal with the missing data of software workload: tolerating the missing data and repairing the missing data. Finally, the data missing processing method proposed in this paper is verified by using ISBSG and CSBSG software workload data sets. The experimental results show that the performance of the two missing data processing methods proposed in this paper is better than that of MINI repair method combined with SVM classification model, and the prediction performance of missing data repair method is better than that of tolerant data loss method. For both datasets, each category of workload data comes from a Gao Si component.
【作者单位】: 北京化工大学经济管理学院;
【基金】:国家自然科学基金(61379046,61432001,91218302) 中央高校基本科研业务费(buctrc201504)~~
【分类号】:O212
本文编号:2486078
[Abstract]:The lack of software workload data is a difficult problem in software work estimation. In this paper, based on naive Bays model and EM (expectation maximization) algorithm, a method to deal with the lack of software workload data is proposed. First of all, this paper introduces the internal mechanism of data loss, and then, this paper proposes a software workload prediction method based on naive Bays and EM algorithm. After that, this paper puts forward two methods to deal with the missing data of software workload: tolerating the missing data and repairing the missing data. Finally, the data missing processing method proposed in this paper is verified by using ISBSG and CSBSG software workload data sets. The experimental results show that the performance of the two missing data processing methods proposed in this paper is better than that of MINI repair method combined with SVM classification model, and the prediction performance of missing data repair method is better than that of tolerant data loss method. For both datasets, each category of workload data comes from a Gao Si component.
【作者单位】: 北京化工大学经济管理学院;
【基金】:国家自然科学基金(61379046,61432001,91218302) 中央高校基本科研业务费(buctrc201504)~~
【分类号】:O212
【相似文献】
相关期刊论文 前10条
1 金勇进;调查中的数据缺失及处理(Ⅰ)——缺失数据及其影响[J];数理统计与管理;2001年01期
2 金勇进;缺失数据的加权调整(系列之Ⅳ)[J];数理统计与管理;2001年05期
3 赵慧秀;马文卿;;带有缺失数据的估计方程[J];曲阜师范大学学报(自然科学版);2009年03期
4 庞新生;;缺失数据处理方法的比较[J];统计与决策;2010年24期
5 杨金英;崔朝杰;;图模型方法用于二值变量相关性分析中缺失数据的估计[J];中国卫生统计;2012年05期
6 帅平;李晓松;周晓华;刘玉萍;;缺失数据统计处理方法的研究进展[J];中国卫生统计;2013年01期
7 金勇进;缺失数据的偏差校正(系列三)[J];数理统计与管理;2001年04期
8 毕华,李济洪;带有缺失数据的聚类分析方法[J];华北工学院学报;2003年05期
9 李开灿;;有缺失数据的META-MARKOV模型的局部计算[J];应用数学学报;2007年03期
10 周伟萍;张德然;杨兴琼;;具有部分缺失数据时两个几何总体的估计[J];长春大学学报;2008年02期
相关会议论文 前2条
1 耿直;葛丹丹;;纵向研究非随机缺失数据分析[A];中国现场统计研究会第12届学术年会论文集[C];2005年
2 张香云;;基于EM算法缺失数据下混合模型的参数估计[A];第十三届中国管理科学学术年会论文集[C];2011年
,本文编号:2486078
本文链接:https://www.wllwen.com/kejilunwen/yysx/2486078.html