在线社会网络分析与挖掘若干关键问题研究

发布时间：2018-09-17 11:37

【摘要】：社会网络因为其结构复杂,用户行为各异,用户各种活动产生的数据量巨大,因此,对其的研究充满了挑战。正因为如此,也吸引了各方研究人员的注意,取得了丰富的研究成果。但是,在线社会网络作为近几年新生事物,还在发展壮大之中,这其中必然有很多问题等待人们的研究。本文针对社会网络个体特点、群体特点以及结构特点,对社会网络中几个关键的问题进行了讨论和研究,主要研究内容和成果包括：(1)在社会网络用户个体分析方面,本文重点研究了通过用户发表内容来计算用户-话题之间的关系。在对用户-话题关系计算方面,本文以用户在社会网络上公开发表内容为数据来源,利用非负矩阵分解技术,提出了用户-话题敏感度计算算法。该算法能够通过分析用户过去发表的内容,有效计算用户对话题感兴趣程度。在真实数据集上的实验结果表明文中提出的算法能够有效分析用户发表内容,计算用户-话题敏感度。特别的,随着微博的流行,信息过载情况越来越严重,用户越来越倾向发表短文本数据。短文本数据使用户数据非常稀疏,本文特别提出基于单词共现的短文本用户-话题敏感度计算算法。普通算法不能很好的处理短文本稀疏数据,而本文提出的专门针对短文本的算法可以有效的避免数据稀疏问题,有效计算出结果。对真实数据进行实验测试结果表明,本文提到的短文本用户-话题关系计算算法能够避免数据稀疏带来的各种问题,有效进行计算。(2)在社会网络结构挖掘方面,本文提出了临界节点和临界块的概念,并设计了有效算法,在网络中发现这类特殊节点。社会网络中不同节点有不同的重要程度,这一概念早已深入人心,但是如何度量节点的重要性,人们提出了各种度量方法以及算法,包括各种中心性,k-shell、k-core等概念以及基于PageRank和HITS的各种算法。本文另辟蹊径,从另一个角度提出了发现社会网络中一类重要节点——临界节点的算法。本文利用矩阵中Fiedler向量的性质,提出了启发式算法有效地发现它们,并且针对真实数据集进行了大量实验。实验结果表明,在社会网络中确实存在着临界节点,本文提出的算法能够有效发现它们。(3)在社会网络群体分析方面,本文研究如何发现网络中的社区问题。社区发现问题吸引了众多研究人员的注意,但是,当前的研究大都基于网络的结构来进行分析。考虑到用户加入社区更多的原因是为特定的话题所吸引,话题对社区形成有重要影响,因此,本文综合考虑了网络结构与用户文本之间的关系,提出了同时基于结构和话题进行社区发现的算法。实验结果表明该算法能够同时综合利用社会网络的结构信息和文本中话题信息,有效的进行社区发现。综上所述,本文针对社会网络中用户分析,特殊节点发现以及社区发现等三个问题,提出了用户-话题计算、临界节点发现和话题社区发现等关键技术,对于社会网络的分析和挖掘工作具有重要的理论意义与应用价值。
[Abstract]:Because of its complex structure, different user behaviors and huge amount of data generated by various user activities, social networks are facing great challenges. As a result, it has attracted the attention of researchers from all walks of life and has made abundant research achievements. In this paper, we discuss and study several key issues in social networks, including individual characteristics, group characteristics and structural characteristics of social networks. The main research contents and achievements include: (1) In the aspect of individual analysis of social network users, this paper focuses on the use of user-generated. In the aspect of user-topic relationship calculation, this paper proposes a user-topic sensitivity calculation algorithm based on non-negative matrix decomposition technique, which takes publicly published content of users on social networks as data source. Experimental results on real data sets show that the proposed algorithm can effectively analyze user publications and calculate user-topic sensitivity. In particular, with the popularity of micro-blogs, information overload is becoming more and more serious, and users are increasingly inclined to publish short text data. In this paper, we propose a short-text user-topic sensitivity algorithm based on word co-occurrence. The general algorithm can not deal with sparse short-text data very well, and the special algorithm for short-text proposed in this paper can effectively avoid the problem of data sparsity and calculate the results. Experimental results on real data are given. The results show that the short-text user-topic relationship algorithm mentioned in this paper can avoid all kinds of problems caused by data sparsity and efficiently compute. (2) In the aspect of social network structure mining, this paper puts forward the concept of critical node and critical block, and designs an effective algorithm to find these special nodes in the network. Points have different degrees of importance, this concept has long been deeply rooted in people's hearts, but how to measure the importance of nodes, people have proposed a variety of measurement methods and algorithms, including various concepts such as centrality, k-shell, k-core and algorithms based on PageRank and HITS. In this paper, a heuristic algorithm based on the properties of Fiedler vectors in matrices is proposed to effectively discover critical nodes, and a large number of experiments are carried out on real data sets. Community discovery has attracted the attention of many researchers, but most of the current research is based on the structure of the network. Therefore, considering the relationship between network structure and user text, this paper proposes an algorithm for community discovery based on both structure and topic. Experimental results show that the algorithm can make use of both the structure information of social network and the topic information in text to effectively discover communities. Aiming at the three problems of user analysis, special node discovery and community discovery in social networks, this paper puts forward the key technologies of user-topic computing, critical node discovery and topic community discovery, which have important theoretical significance and application value for the analysis and mining of social networks.
【学位授予单位】：东北大学
【学位级别】：博士
【学位授予年份】：2014
【分类号】：TP393.09;TP391.1

【相似文献】