基于类标依赖性的多关系图多类标分类算法研究
发布时间:2018-01-14 05:04
本文关键词:基于类标依赖性的多关系图多类标分类算法研究 出处:《哈尔滨工业大学》2016年硕士论文 论文类型:学位论文
【摘要】:随着移动互联网的飞速发展,各种移动互联网应用已经成为人们最重要的信息交互平台,这些信息交互平台将人们串联成各种各样的虚拟社会网络。多个社会网络可以通过节点或关系的映射而形成一个多关系社会网络,通常可以用多关系图来表示。多关系图中节点的多类标分类在网络精准营销、社会网络分析、社会化搜索等领域都有重要的应用价值。在多类标分类问题中,如何有效利用类标依赖性信息对于提高分类算法的性能是至关重要的。在多关系图的多类标分类问题中,类标依赖性信息包括两个方面:隐含在节点内容属性中的类标依赖性和隐含于关系拓扑中的类标依赖性。本文的研究重点是如何有效挖掘这两种类标依赖性信息,并在此基础上设计出有针对性的多类标分类算法。基于内容属性类标依赖性的思想,本文利用类标共现信息来计算类标依赖性,并据此提出了一种基于内容属性类标依赖性的多关系图多类标分类算法(MRML-C)。该算法结合了类标空间聚类划分策略,有效地将多类标分类问题分解为多个规模缩小的子问题,降低了算法复杂度。对比实验结果表明利用类标依赖性进行类标空间划分的策略有效的解决了类标爆炸的问题,并且MRML-C在大多数的数据集上都表现出了较好的分类性能。基于关系拓扑类标依赖性的思想,本文利用类标共现信息和关系拓扑信息共同计算类标依赖性,并据此提出了一种基于关系拓扑类标依赖性的多关系图多类标分类算法(MRML-R),该算法首先会对类标空间进行聚类划分,然而采用问题转化算法将各个多类标分类子问题转化成单类标分类问题,在训练模型的过程中采用了基于随机游走样本抽样方法的随机森林算法,有机融合了关系拓扑信息。最后采用多数投票策略集成各个子空间的预测结果。对比实验结果表明MRML-R算法在二值验证指标上具有更好的分类表现。
[Abstract]:With the rapid development of mobile Internet, various mobile Internet applications have become the most important information exchange platform. These information interaction platforms connect people into a variety of virtual social networks. Multiple social networks can form a multi-relational social network by mapping nodes or relationships. The multi-class classification of nodes in multi-relational graph has important application value in the fields of network precision marketing, social network analysis, social search and so on. It is very important to make use of class dependency information to improve the performance of classification algorithm. In the multi-class classification problem of multi-relational graph. The class label dependency information includes two aspects:. The class dependency hidden in the node content attribute and the class label dependency hidden in the relational topology. This paper focuses on how to effectively mine these two kinds of dependency information. Based on the idea of content attribute class dependency, this paper uses class co-occurrence information to calculate class label dependency. On the basis of this, a multi-relational graph multi-class classification algorithm based on content attribute class dependency is proposed, which combines the clustering strategy of cluster space. The multi-class classification problem is effectively decomposed into multiple subproblems with reduced scale. The algorithm complexity is reduced. The comparison experiment results show that the strategy of class label space partition based on class dependency can effectively solve the problem of class label explosion. And MRML-C has shown good classification performance on most data sets. In this paper, class dependency is calculated by class cooccurrence information and relational topological information. Based on this, a multi-relational graph multi-class classification algorithm based on relational topological class dependency is proposed. At first, the algorithm will be used to classify the cluster space. However, the problem transformation algorithm is used to transform the multi-class sub-problem into a single-class classification problem. In the process of training model, the random forest algorithm based on random walk sample sampling method is adopted. Finally, the majority voting strategy is used to integrate the prediction results of each subspace. The experimental results show that the MRML-R algorithm has better classification performance on the binary verification index.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP301.6
,
本文编号:1422115
本文链接:https://www.wllwen.com/guanlilunwen/yingxiaoguanlilunwen/1422115.html