当前位置:主页 > 科技论文 > 自动化论文 >

主动学习停止准则与评价测度研究

发布时间:2018-09-11 19:43
【摘要】:主动学习是机器学习领域中最为活跃的研究方向之一,其旨在花费尽可能少的人类标注代价获得性能较高的分类模型。因此,在主动学习过程中,能否定义一个合适的停止准则对主动学习是否能发挥出最大效应具有重大意义。此外,在对一种主动学习算法的性能进行评估时,往往需要定义一些定量的评价测度,而这正是前人工作所忽略的问题。故本文主要针对上述两类问题展开研究。本文首先介绍了几种常用的主动学习停止准则,进而针对现有的选择精度主动学习停止准则仅适用于批量样例标注场景这一缺点,提出了一种改进的适用于单轮单样例标注场景的选择精度停止准则。该准则通过监督自本轮起前溯的固定学习轮次内的预测标记与真实标记间的匹配关系,对选择精度进行近似的评估计算,匹配度越高则选择精度越高。继而利用滑动时间窗实时监测该选择精度的变化,若当其高于事先设定的阈值时,则停止主动学习算法的运行。以基于支持向量机的主动学习方法为例,通过6个基准数据集对该准则的有效性与可行性进行了验证,结果表明当选取合适的阈值时,该准则能找到主动学习停止的合理时机。该方法扩大了选择精度停止准则的适用范围,提升了其实用性。目前,适用于主动学习的算法多种多样,但这些主动学习算法都共用一个统一的性能评估测度,即学习曲线。学习曲线在整个主动学习迭代过程中能够很好的区分分类模型间的性能差异,因此大多数文章都使用学习曲线作为比较不同分类算法性能的标准。但是对于两个分类性能相近的主动学习算法,很难从学习曲线的分布上观察到性能变化的细微差异。针对这一问题,通过深入挖掘学习曲线中所隐藏的信息,提出了四种定量的主动学习性能评估测度,分别为学习曲线下的面积、对数化的学习曲线下的面积、平均梯度角以及对数化的平均梯度角。在比较基于同质分类器的主动学习算法时,这四种度量测度均能够保证评估结果的公正性;而对于异质的分类器,在比较不同的主动学习算法性能时,平均梯度角以及对数化的平均梯度角比另外两种评估测度可能更加适用。此外,对数化的学习曲线下的面积与对数化的平均梯度角则更关注于主动学习初始学习阶段的性能提升速率。通过在9个数据集以及多个基准主动学习算法上的大量实验验证了上述四种测度的实用性。
[Abstract]:Active learning is one of the most active research fields in the field of machine learning. Therefore, it is of great significance to define an appropriate stop criterion in the process of active learning. In addition, when evaluating the performance of an active learning algorithm, it is often necessary to define some quantitative evaluation measures, which is the problem neglected by the previous work. Therefore, this paper mainly focuses on the above two kinds of problems. In this paper, we first introduce several commonly used active learning stopping criteria, and then aim at the disadvantage that the existing active learning stopping criteria with selective precision are only suitable for batch sample tagging scenarios. In this paper, an improved precision stopping criterion for single-wheel single-sample scene selection is proposed. By monitoring the matching relationship between prediction marks and real markers in a fixed learning cycle from the beginning of this round, the criterion evaluates and calculates the selection accuracy approximately, and the higher the matching degree is, the higher the selection accuracy is. Then the sliding time window is used to monitor the change of the selection accuracy in real time, and when the threshold is higher than the pre-set threshold, the active learning algorithm is stopped. Taking the active learning method based on support vector machine as an example, the validity and feasibility of the criterion are verified by six datum data sets. The results show that the criterion can find a reasonable time to stop active learning when the appropriate threshold is selected. This method expands the scope of application of the selective precision stop criterion and improves its practicability. At present, there are a variety of algorithms for active learning, but these active learning algorithms all share a unified performance evaluation measure, that is, learning curve. The learning curve can distinguish the performance difference between classification models well in the whole active learning iterative process, so most articles use learning curve as the standard to compare the performance of different classification algorithms. However, for two active learning algorithms with similar classification performance, it is difficult to observe the subtle variation of performance in the distribution of learning curves. In order to solve this problem, by digging the hidden information in the learning curve, four kinds of quantitative active learning performance evaluation measures are proposed, which are the area under the learning curve and the area under the logarithmic learning curve. The average gradient angle and the logarithmic average gradient angle. When comparing active learning algorithms based on homogeneous classifiers, these four metrics can ensure the fairness of the evaluation results, while for heterogeneous classifiers, when comparing the performance of different active learning algorithms, The average gradient angle and the logarithmic average gradient angle may be more suitable than the other two evaluation measures. In addition, the area under logarithmic learning curve and the average gradient angle of logarithmic learning pay more attention to the performance improvement rate in the initial learning stage of active learning. The practicability of the four measures is verified by a large number of experiments on 9 datasets and multiple benchmark active learning algorithms.
【学位授予单位】:江苏科技大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP181

【相似文献】

相关期刊论文 前7条

1 左相国,王晓明;学习曲线在工程技术改造中的应用[J];科技进步与对策;2004年09期

2 ;苹果iPhone的学习曲线[J];软件世界;2007年17期

3 王欢欢;王宁;万小兰;;结合学习曲线分析企业信息化成本[J];科技视界;2014年01期

4 ;实施手记之十一:ERP学习曲线[J];IT经理世界;2000年21期

5 王科;我国集成电路企业学习曲线的实证研究初探[J];科研管理;2001年03期

6 马庆贺;孟子厚;;听音训练对汉语单音节听感清晰度的影响[J];声学技术;2014年02期

7 ;[J];;年期

相关硕士学位论文 前10条

1 吴如陈;微创经椎间孔腰椎间融合术治疗腰椎退变性疾病的学习曲线评估[D];福建医科大学;2015年

2 凤凤;腹腔镜肝切除术学习曲线的研究[D];中国人民解放军医学院;2015年

3 梁俊杰;完全乳晕入路腔镜甲状腺切除术持镜者的学习曲线[D];暨南大学;2015年

4 朱莉;考虑学习曲线的项目人力资源分配研究[D];哈尔滨工业大学;2015年

5 周亮;我国风机制造业学习曲线研究[D];广西科技大学;2015年

6 张馨月;高强度聚焦超声治疗子宫肌瘤的学习曲线研究[D];重庆医科大学;2016年

7 鲁超;腹腔镜胰十二指肠切除术的学习曲线[D];浙江大学;2016年

8 杨菊;主动学习停止准则与评价测度研究[D];江苏科技大学;2016年

9 刘晓勇;电子类上市公司学习曲线实证研究[D];长沙理工大学;2007年

10 谢榕城;经皮内镜下胃造瘘术的学习曲线[D];福建医科大学;2010年



本文编号:2237684

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2237684.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户cc061***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com