基于ELM的主动学习的研究
					发布时间:2018-05-13 05:09
				
				
				
				
				本文选题:主动学习 + 极限学习机 ; 参考:《河北大学》2017年硕士论文
【摘要】:现实世界中存在着大量无类标的数据,如医疗图像数据、网页数据、视频数据等。在大数据时代,这种情况更加突出。标注这些无类标的数据,需要付出巨大的代价。主动学习是解决这一问题的有效手段,是近几年机器学习和数据挖掘领域中的研究热点之一。本文在分类的框架下,研究了基于在线序列极限学习机的主动学习问题。本文的贡献主要包括以下两点:(1)研究了随机权分布对极限学习机性能影响,得出了如下结论:(a1)对于不同的问题或不同的数据集,服从[-1,1]区间均匀分布的随机权不一定是最优的选择;(a2)用服从均匀分布和高斯分布的随机数初始化输入层权值和隐含层结点的偏置得到的测试精度没有本质的区别。(2)提出了一种基于在线序列极限学习机的主动学习算法。提出的算法具有三个优点:(b1)利用在线序列极限学习机增量学习的特点,可显著提高学习系统的效率;(b2)利用样例熵作为启发式度量无类标样例的重要性,这种度量能充分刻画样例对分类贡献的信息量;(b3)用K-近邻分类器作为Oracle标注选出的无类标样例的类别,K-近邻Oracle独立于评价样例重要性的分类器。实验结果显示,本文提出的算法具有学习速度快,标注准确的特点。
[Abstract]:In the real world, there are a lot of unclassified data, such as medical image data, web data, video data and so on. In the big data era, this situation is even more prominent. Tagging these unmarked data requires a great deal of cost. Active learning is an effective method to solve this problem, and it is one of the research hotspots in the field of machine learning and data mining in recent years. In this paper, the active learning problem based on online sequence limit learning machine is studied in the framework of classification. The contributions of this paper mainly include the following two points: 1) the influence of random weight distribution on the performance of LLM is studied, and the following conclusion is drawn: 1) for different problems or different data sets, The random weight of uniform distribution from [-1] interval is not necessarily the best choice.) there is no essential difference between the accuracy of testing obtained by initializing the input layer weights from the random numbers of uniform distribution and Gao Si distribution and bias of hidden layer nodes. An active learning algorithm based on online sequence learning machine is proposed. The proposed algorithm has three advantages: (1) using the incremental learning of on-line sequence limit learning machine, the efficiency of the learning system can be improved significantly. (2) the importance of using sample entropy as a heuristic to measure the sample without class can be significantly improved. This metric can fully describe the contribution of samples to classification information / b _ 3) using K-nearest neighbor classifier as the classifier for Oracle tagging and selecting a class of non-class sample samples, K-nearest neighbor Oracle is independent of evaluating the importance of sample examples. Experimental results show that the proposed algorithm has the characteristics of fast learning speed and accurate labeling.
【学位授予单位】:河北大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP181
【参考文献】
相关期刊论文 前2条
1 翟俊海;李塔;翟梦尧;王熙照;;ELM算法中随机映射作用的实验研究[J];计算机工程;2012年20期
2 田春娜;高新波;李洁;;基于嵌入式Bootstrap的主动学习示例选择方法[J];计算机研究与发展;2006年10期
,本文编号:1881784
本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1881784.html

