微博用户行为分析和网络结构演化的研究
发布时间:2018-05-27 20:30
本文选题:微博网络 + 用户行为 ; 参考:《北京交通大学》2014年博士论文
【摘要】:随着互联网络、尤其移动互联网络的飞速发展,微博已经成为一种非常重要的在线社会网络形式。在微博网络中,用户接入方式更加方便多样,交互方式更加灵活快捷,信息传播更加迅速广泛,其中用户行为和网络结构是影响信息传播过程的两个关键因素。鉴于此,本文采用交叉学科的思想和方法,针对微博中用户行为特征和模型、用户特征量分布形成机制和增长规律、网络中心性和信息传播度量、网络拓扑结构特征和演化模型等问题进行了研究,尝试发现微博用户行为模式和网络结构演化规律,建立能够刻画这些规律的数学模型,并寻找可以预测用户行为的相关策略。论文的工作有助于认识微博用户行为特征,加深对微博网络结构和信息传播关系的认识,也为复杂网络和社会网络的理论研究提供一些探索性的结果。 论文的研究工作得到了国家自然科学基金项目(No.61172072、61271308)、北京市自然科学基金项目(No.4112045)和中央高校基本科研业务费专项资金研究生创新项目(No.2011YJS215)的支持,主要工作和创新点包括以下几个方面: 1.研究微博用户特征量的分布和用户发布行为规律,建立用户发布微博的行为模型。实证分析发现新浪微博用户特征量具有不同幂律分布特征,且互相之间存在不同的相关性。发现用户个体和群体发布微博的时间间隔均呈现幂律分布,幂律指数与用户活跃程度成正比;用户发布兴趣受到其他用户交互行为的影响,并有明显的周期性;用户发布行为具有自相似特征。本文分析了基于社交驱动和兴趣驱动共同影响的微博用户发布模型,提出了一种基于用户兴趣衰减服从Logistic函数的用户发布模型,并使用该模型仿真验证了用户发布微博的时间间隔分布特征。此研究有助于更深入地理解微博用户的行为特征,为进一步研究微博网络结构和信息传播模式提供理论依据和形式参考。 2.研究微博用户特征量分布的形成机制和增长规律。使用双帕累托对数正态(DPLN)分布对用户特征量分布进行拟合,相比对数正态分布和幂律分布,可以得到更优的效果,同时用户活跃时间服从指数分布,不同活跃时间的用户特征量都近似服从对数正态分布,用户特征量的增长率服从对数正态分布,且与特征量自身的规模无关,因此使用双帕累托对数正态分布模型解释了用户特征量的双段幂律形成机制。基于向量余弦距离相似性的K-means聚类算法,提出一种分析微博用户特征量增长模式的计算方法,并对不同排序和初始规模实际用户特征量的时间序列数据进行聚类分析;分析导致用户粉丝数爆发式增长的原因,并发现微博用户特征量和用户数增长之间存在异速增长现象。 3.分析微博网络节点中心性特征并提出用户影响力度量方法。根据新浪微博实际用户数据,构造了两个基于双向“关注”的用户关系网络;通过分析网络拓扑统计特征,发现上述两个网络都具有小世界和无标度的特征;然后分别对两个网络的四种中心性指标(节点度、紧密度、介数和k-Core)及其相关性进行分析;在此基础上,借助基于传染病动力学的SIR信息传播模型,分别分析两个网络中具有不同中心性指标的初始传播节点对信息传播速度和范围的影响。结果表明,紧密度和k-Core较其他指标可以更加准确的描述节点在信息传播中所处的网络核心位置。进一步的分析可知上述两个指标有助于识别信息传播拓扑网络中的关键节点。该方法可为微博营销、用户推荐、网络舆情分析等领域的应用提供理论支撑。 4.提出一种基于社团和混合连接特征的网络演化模型。通过对两个微博用户双向关注网络拓扑特征的进一步分析,发现二者均为异配网络,具有分层性质和社团结构,其社团规模呈指数分布。然后,根据微博用户双向关注数近似符合对数正态分布,以及真实微博双向关注网络的结构特点及其生成机制,提出了一种基于社团结构和混合连接特征的网络生成模型,该模型的混合连接机制包括:新增节点在社团内部分别采用服从对数正态分布适应度的择优连接和随机连接机制;已有节点在社团内择优选择后分别采用近邻互联和全局互联机制。仿真结果表明,该模型生成网络的度分布、聚类系数、度相关性、最短路径长度和社团结构等网络性质和特征参数能较好的符合实际网络,通过调节参数可以生成不同度分布和聚类系数的网络。
[Abstract]:With the rapid development of Internet, especially the mobile Internet, micro-blog has become a very important form of online social network. In the micro-blog network, the way of user access is more convenient, more flexible, and more rapid and extensive information dissemination. The user behavior and network structure affect the transmission of information. In view of this, this paper uses the ideas and methods of cross discipline to study the user behavior characteristics and models in micro-blog, the formation mechanism and growth pattern of user characteristic quantity distribution, network centrality and information dissemination measurement, network topology features and evolution model, and try to find the micro-blog user line. For the evolution of pattern and network structure, a mathematical model that can depict these rules is established and the relevant strategies to predict user behavior are found. The work of this paper helps to understand the behavior characteristics of micro-blog users, deepen the understanding of the structure of micro-blog network and the relationship of information dissemination, and provide the theoretical research for complex networks and social networks. Some exploratory results.
The research work of the paper is supported by the National Natural Science Fund (No.6117207261271308), the Beijing Natural Science Foundation Project (No.4112045) and the special fund graduate innovation project (No.2011YJS215) of the basic scientific research services of the Central University. The main work and the new points include the following aspects:
1. study the distribution of user characteristics of micro-blog and the law of user release behavior and establish the behavior model of micro-blog. The empirical analysis shows that the user characteristics of sina micro-blog have different power law distribution characteristics, and there are different correlations between each other. It is found that the time interval between the user and the group of micro-blog has a power law distribution. The power law index is proportional to the activity degree of the user; the user's release interest is influenced by the interaction behavior of other users and has obvious periodicity. The user release behavior has the self similar characteristics. This paper analyzes the micro-blog user release model based on the common influence of social driven and interest driven, and puts forward a kind of attenuation based on the user interest attenuation. The user release model obeys the Logistic function, and uses this model to simulate and verify the time distribution characteristics of the user published micro-blog. This research helps to understand the behavior characteristics of micro-blog users more deeply and provide the theoretical basis and form reference for further research on the structure of micro-blog network and the mode of information dissemination.
2. study the formation mechanism and growth law of the distribution of micro-blog user characteristic quantity. Using the double Pareto logarithmic normal (DPLN) distribution to fit the distribution of user characteristics, compared with the logarithmic normal distribution and power law distribution, the better effect can be obtained. At the same time, the active time of the user obeys the exponential distribution, and the user characteristic of different active time is close. Similar to lognormal distribution, the growth rate of the user's characteristic quantity obeys the lognormal distribution and is independent of the scale of the characteristic quantity itself. So using the double Pareto log normal distribution model, the two segment power law formation mechanism of the user's characteristic is explained. Based on the K-means clustering algorithm of the vector cosine distance similarity, a kind of analysis micro-blog is proposed. The calculation method of the user characteristic quantity growth model and the clustering analysis of the time series data of different sort and initial scale actual user characteristic quantity, analyze the cause of the explosive growth of the number of user fans, and find that there is a fast growth phenomenon between the micro-blog user characteristic and the increase of the number of users.
3. analyze the centrality characteristics of micro-blog network node and propose the method of user influence measurement. According to the actual user data of sina micro-blog, two user relations networks based on two-way "concern" are constructed. By analyzing the statistical characteristics of the network topology, it is found that the above two networks have the characteristics of small world and scale-free, and then respectively to two. The four central indexes (node degree, tightness, mediate and k-Core) and their correlation are analyzed. On this basis, the influence of initial propagation nodes with different central indexes on the speed and range of information propagation in the two networks is analyzed by using the SIR information propagation model based on the dynamics of infectious diseases. Tightness and k-Core can be more accurate than other indicators to describe the network core location of nodes in information propagation. Further analysis can help to identify key nodes in the information propagation topology network. This method can be used in the application of micro-blog marketing, user recommendation, network public opinion analysis and other fields. On the support.
4. a network evolution model based on community and mixed connection features is proposed. Through further analysis of the network topology features of two micro-blog users, it is found that both of the two are heterogeneous networks with hierarchical and community structure, and their community scale is exponentially distributed. Then, according to the approximate logarithm of the two-way concern number of micro-blog users Normal distribution, as well as the structure characteristics and generation mechanism of real micro-blog two-way concern network, proposed a network generation model based on community structure and mixed connection features. The hybrid connection mechanism of the model includes the preferred connection and random connection of the new nodes in the community to obey the logarithmic normal distribution. The simulation results show that the model generates the degree distribution of the network, the clustering coefficient, the degree correlation, the shortest path length and the community structure, and the network properties and characteristic parameters can be better conformed to the actual network, and can be generated by adjusting the parameters. A network of different degree distribution and clustering coefficient.
【学位授予单位】:北京交通大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:TP393.092;F206
【参考文献】
相关期刊论文 前10条
1 王林;张婧婧;;复杂网络的中心化[J];复杂系统与复杂性科学;2006年01期
2 胡海波;王科;徐玲;汪小帆;;基于复杂网络理论的在线社会网络分析[J];复杂系统与复杂性科学;2008年02期
3 张晨逸;孙建伶;丁轶群;;基于MB-LDA模型的微博主题挖掘[J];计算机研究与发展;2011年10期
4 樊鹏翼;王晖;姜志宏;李沛;;微博网络测量研究[J];计算机研究与发展;2012年04期
5 许晓东;肖银涛;朱士瑞;;微博社区的谣言传播仿真研究[J];计算机工程;2011年10期
6 傅雷扬;王汝传;王海艳;任勋益;;R/S方法求解网络流量自相似参数的实现与应用[J];南京航空航天大学学报;2007年03期
7 杨春霞;胡丹婷;胡森;;微博病毒传播模型研究[J];计算机工程;2012年15期
8 王元卓;靳小龙;程学旗;;网络大数据:现状与展望[J];计算机学报;2013年06期
9 易成岐;鲍媛媛;薛一波;姜京池;;新浪微博的大规模信息传播规律研究[J];计算机科学与探索;2013年06期
10 何静;郭进利;徐雪娟;;微博关系网络模型研究[J];计算机工程;2013年11期
相关博士学位论文 前1条
1 殷瑞飞;数据挖掘中的聚类方法及其应用[D];厦门大学;2008年
,本文编号:1943643
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1943643.html