基于结构相似度的大规模社交网络聚类算法研究
本文选题:有向图 + 并行算法 ; 参考:《南开大学》2013年硕士论文
【摘要】:社交网络为社交系统下个体之间的关系所组成的复杂网络结构。随着信息时代互联网科技的迅猛发展,社交网络,特别是在线社交网络,已成为人与人之间分享信息不可或缺的媒介。社交网络中个体之间的相互连接关系作为信息的传播途径,在很多方面有不可忽视的作用。如广告投递,潜在商机发现,效果预测以及危机预警。因此如何从这些庞大的网络中获取有价值的信息成为了目前重要的研究课题。网络结构分析也吸引了众多研究者的关注,其中的网络聚类即是一种有效的结构分析手段和途径。 然而目前的网络聚类算法仍面临重大的挑战。首先,现有网络聚类算法没有充分考虑实际社交网络的特性。对社交网络的结构分析不同于一般网络聚类,社交网络中常常存在一些具有特殊作用的点,同时节点间的社交关系大多为有向的。其次,没有将大规模网络数据的处理作为目标。本文针对现有社交网络聚类中所面临的问题,提出了一种面向大规模有向网络的结构聚类算法。 首先,提出了基于结构相似度的处理有向网络的聚类方法。本文对有向网络进行聚类操作提出了两种不同的方法:1.提出一种两阶段方法,首先将有向网络近似为无向网络,再使用结构相似度聚类算法进行结构分析;2.对现有的针对无向网络的方法进行改进使其能够直接对有向网络进行聚类。 其次,针对社交网络的大规模特性,本文研究了如何将原本非并行的基于结构相似度的聚类算法进行并行化,使其能够处理大规模网络数据。算法中,针对社交网络数据特性,设计了合理的数据划分策略,各机器之间的数据交换策略。本文对算法进行了理论分析,证明采用这种高效率的并行编程框架实现的并行网络结构聚类算法的结果与原非并行算法的结果是一致的。 最后,本文基于MapReduce并行架构实现了所提出的并行式网络聚类算法。大量实验结果表明本文提出的算法能够提高有向网络聚类算法的准确度,同时并行方法能够有效处理大规模的网络聚类问题。 综上所述,本文在有向社交网络并行聚类问题上取得了一定的进展和效果,在社交网络的结构信息发现相关领域有很好的应用前景。
[Abstract]:A complex network structure consisting of relationships between individuals in a social system. With the rapid development of Internet technology in the information age, social networks, especially online social networks, have become an indispensable medium for sharing information among people. As a way to spread information, the interconnectedness of individuals in social networks plays an important role in many aspects. Such as advertising delivery, potential business opportunities discovery, effect prediction and crisis warning. Therefore, how to obtain valuable information from these huge networks has become an important research topic. Network structure analysis has also attracted the attention of many researchers, among which network clustering is an effective means and approach to structure analysis. However, the current network clustering algorithm still faces great challenges. Firstly, the existing clustering algorithms do not fully consider the characteristics of real social networks. The structure analysis of social network is different from that of general network clustering. There are some special points in social network, and the social relations between nodes are mostly directed. Second, the processing of large-scale network data has not been targeted. In this paper, a structural clustering algorithm for large scale directed networks is proposed to solve the problems existing in the existing social network clustering. Firstly, a clustering method based on structural similarity is proposed to deal with directed networks. In this paper, we propose two different methods of clustering for directed networks: 1. In this paper, a two-stage approach is proposed. Firstly, the directed network is approximated as an undirected network, and then the structural similarity clustering algorithm is used to analyze the structure. The existing methods for undirected networks are improved to cluster the directed networks directly. Secondly, in view of the large-scale characteristics of social networks, this paper studies how to parallelize the original non-parallel clustering algorithm based on structural similarity to enable it to deal with large-scale network data. In the algorithm, a reasonable data partition strategy and a data exchange strategy between different machines are designed according to the data characteristics of social network. In this paper, the theoretical analysis of the algorithm is carried out, and it is proved that the result of the parallel network clustering algorithm implemented by this efficient parallel programming framework is consistent with that of the original non-parallel algorithm. Finally, the proposed parallel network clustering algorithm based on MapReduce parallel architecture is implemented. A large number of experimental results show that the proposed algorithm can improve the accuracy of the directed network clustering algorithm, and the parallel method can effectively deal with large-scale network clustering problems. To sum up, this paper has made some progress and effect on the parallel clustering of directed social networks, and has a good application prospect in the related fields of structural information discovery of social networks.
【学位授予单位】:南开大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP311.13;O157.5
【共引文献】
相关期刊论文 前10条
1 郭进时;汤红波;王晓雷;;基于社会网络增量的动态社区组织探测[J];电子与信息学报;2013年09期
2 郎波;张博宇;;面向大数据的非结构化数据管理平台关键技术[J];信息技术与标准化;2013年10期
3 邵景峰;崔尊民;王进富;白晓波;;大数据下纺织制造执行系统的构建[J];纺织器材;2013年06期
4 张亚楠;谭跃生;;基于MapReduce的并行遮盖文本聚类算法[J];内蒙古科技大学学报;2013年03期
5 张毅;曹晶晶;齐莉娜;吴必虎;;旅游目的地虚拟网络结构特征研究——以黄山市为例[J];北京大学学报(自然科学版);2013年06期
6 周国亮;朱永利;王桂兰;;CC-MRSJ:Hadoop平台下缓存敏感的星型联接算法[J];电信科学;2013年10期
7 章祥荪;张忠元;;非负矩阵分解:模型、算法和应用[J];重庆师范大学学报(自然科学版);2013年06期
8 周涛;张子柯;陈关荣;汪小帆;史定华;狄增如;樊瑛;方锦清;韩筱璞;刘建国;刘润然;刘宗华;陆君安;吕金虎;吕琳媛;荣智海;汪秉宏;许小可;章忠志;;复杂网络研究的机遇与挑战[J];电子科技大学学报;2014年01期
9 王伟;杨慧;龚凯;唐明;都永海;;复杂网络上的局域免疫研究[J];电子科技大学学报;2013年06期
10 刘莹;刘国奇;任介夫;姜琳颖;张斌;;基于Web服务复杂网络的服务社区构建方法[J];东南大学学报(自然科学版);2013年06期
,本文编号:1840736
本文链接:https://www.wllwen.com/wenyilunwen/guanggaoshejilunwen/1840736.html