多标签分类算法研究及其应用

发布时间：2018-03-01 16:14

本文关键词： 多标签分类标签相关性集成算法 k-labelsets 情境推荐　出处：《山东大学》2017年硕士论文　论文类型：学位论文

【摘要】：近年来,我们进入了数据爆炸时代,随着数据的增长以及数据存储能力的增强,使得我们可以获得形式各异的数据源并将其存储于信息库中。通过对信息库中存储的数据进行分析挖掘,可以有效地抽取出富含价值的信息,有助于商业、科研等活动的决策。而分类技术作为其中一种数据分析挖掘的形式,它可以抽取能够描述重要数据集合的模型,用于预测数据对象的离散类别。而根据分类预测后的样本类别标签个数不同,分类问题又可分为单标签分类和多标签分类。在传统的监督学习任务中我们所面临的问题大部分是单标签分类问题,然而,在很多分类任务中每个样本需要与多个类别标签相关联,如在文本分类中(与多种类型相关联的书)和医学诊断中(例如,多发病人的疾病诊断)等。而这些问题是单标签分类技术无法解决的,因此,近些年来,多标签分类的研究得到了国内外学者的广泛关注。目前解决多标签分类问题的算法并没有达到令人满意的效果,研究者们也试图通过考虑标签相关性以及通过分类器集成等方法来提高分类性能。通过对现有的多标签分类算法的研究分析,其中,RAkEL多标签分类算法是一种使用分类器集成技术的较为有效的多标签分类算法,然而由于该算法在子分类器构造过程中标签组合具有随机性以及没有充分利用标签的相关性信息等因素,其分类效果仍有提升的空间。本文通过将标签相关性与分类器集成技术应用于统一的框架,提出了一种改进自RAkEL算法的新的多标签分类算法。本文提出的方法在实验中与RAkEL多标签分类算法相比较在多个评测指标上得到性能提升,与其他多标签分类算法相比也具有竞争性的优势。另外,本文也探索了多标签分类算法在推荐系统领域的应用。在推荐系统领域,上下文感知推荐系统利用上下文情境信息进一步提高了推荐的精确度和用户满意度,但上下文感知推荐系统研究的问题仍然是如何将项目集合推荐给目标用户。在本文中,我们将研究现实生活中另外一种推荐场景:当用户选定某个项目时,我们为其推荐最合适的应用情境,即上下文,例如,某用户已经决定去看某部电影,这时他需要的建议是在哪里(家里还是剧院)、和谁(家人还是朋友)一起观看会获得更好的观影体验。情境推荐不仅可以为用户消费某个项目推荐最合适的情境以提高消费体验,也可以协助用户做项目选择决策。我们将此类推荐问题转化为多标签分类问题进行求解,首先我们验证了转化为多标签分类问题进行求解的有效性,然后通过改进多标签分类算法,得到适用于情境推荐问题的方法,并在两个领域的数据集上进行了实验。实验结果表明,本文算法可以给出个性化建议,并在多个指标上好于原算法。
[Abstract]:In recent years, we have entered the era of data explosion, with the growth of data and the enhancement of data storage capacity, It allows us to access a variety of data sources and store them in a repository. By analyzing and mining the data stored in the repository, we can effectively extract valuable information and help business. As one of the forms of data analysis and mining, classification technology can extract models that can describe important data sets. Used to predict discrete classes of data objects. The number of sample class labels predicted according to classification is different, The classification problem can be divided into single label classification and multi label classification. In the traditional supervised learning task, most of the problems we face are single label classification problems, however, Each sample needs to be associated with multiple category labels in many classification tasks, such as text categorization (books associated with multiple types) and medical diagnostics (for example, These problems cannot be solved by single label classification technology, so in recent years, The research of multi-label classification has received extensive attention from scholars at home and abroad. At present, the algorithm to solve the problem of multi-label classification has not achieved satisfactory results. Researchers also try to improve classification performance by considering label correlation and classifier integration. Among them, Rakel multi-label classification algorithm is a more effective multi-label classification algorithm using classifier integration technology. However, due to the randomness of tag combination in the construction of subclassifier, the correlation information of label is not fully utilized. There is still room for improvement in the classification effect. In this paper, the label correlation and classifier integration technology is applied to the unified framework. In this paper, a new multi-label classification algorithm is proposed to improve the self-label classification algorithm. Compared with the RAkEL multi-label classification algorithm, the method proposed in this paper improves the performance of the multi-label classification algorithm in comparison with that of the RAkEL multi-label classification algorithm. In addition, this paper also explores the application of multi-label classification algorithm in the field of recommendation system. Context-aware recommendation system makes use of context context information to further improve the accuracy of recommendation and user satisfaction, but the problem of context-aware recommendation system is still how to recommend the item set to the target user. We're going to look at another recommendation scenario in real life: when a user selects a project, we recommend the most appropriate application scenario for them, that is, context, for example, a user has decided to see a movie. The advice he needs at this point is where (home or theater), who (family or friend) will get a better viewing experience. Situational recommendations can not only recommend the most appropriate scenario for the user to consume a particular project. To improve the consumer experience, It can also help users to make project selection decision. We transform this kind of recommendation problem into multi-label classification problem and solve it. First, we verify the validity of solving multi-label classification problem. Then, by improving the multi-label classification algorithm, the method suitable for the situation recommendation problem is obtained, and the experiments are carried out on the data sets in two fields. The experimental results show that the proposed algorithm can give personalized advice. And in many indicators better than the original algorithm.
【学位授予单位】：山东大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP181;TP391.3

【相似文献】