基于微博的局部社交网络构建及热点人物提取方法研究
本文选题:微博 + 社交网络 ; 参考:《西华大学》2016年硕士论文
【摘要】:随着互联网时代的到来,网络渐渐地融入人们的生活。许多网民通过互联网进行购物、交友、学习等日常活动,它已经成为人们生活中十分重要的一部分。在人们的互联网生活中,网络社交平台,如:新浪微博、腾讯微博、Twitter等,已经成为了众多网民较为活跃的场所。人们可以在这些社交平台中结交新朋友,并与其他用户分享他们感兴趣的文字、图片、视频等信息,而这些被用户发布的信息在一定程度上反映出了用户的行为习惯和兴趣爱好。就目前来说,社交数据具有内容简短、数量庞大、实时性高等特点,因此从海量社交数据中挖掘出有效的信息是数据挖掘领域的一大挑战。面对着大量的社交平台用户数据,构建用户的社交图谱和兴趣图谱是提高社交网络中社交搜索质量的关键。针对与上述问题,为了有效地构建出用户的社交图谱和兴趣图谱,本文的主要研究内容包含有以下几点:1.本文基于链路预测(Link Prediction)的思想,通过改进Friend Link(FL)算法,提出了活跃朋友的预测算法(Active Friend Prediction,AFP)。为了适用于微博这类拥有稀疏的用户属性信息的在线社交平台,本文将用户的在线社交网络抽象为有向图(其中节点代表用户、边代表用户之间存在关系),通过图的局部链路特征来分析用户之间的相似度。本文提出了节点活跃系数的概念,即利用各个节点的出度和入度,通过它们的比值来刻画节点的活跃程度,进而从用户的社交网络图中筛选出行为活跃的用户。同时结合社交网络图的节点之间的链路结构相似度来计算出节点的活跃度评分,从而根据该评分提取出与用户有潜在关系的活跃间接邻居,并利用这些节点构建出用户的高活跃度局部社交网络,即用户的社交图谱。2.本文提出了用户关注的隐式和显式热点人物提取算法(Focusing Personae Extraction algorithm,FPE)。微博是一种以短文本为信息载体的社交平台,虽然微博文本包含着用户关注的人物实体,但是,这些文本中总是充斥着大量的噪声信息。因此,本文从用户及其社交图谱中的用户所发表的微博中提取出人物实体,根据目标用户社交图谱中用户的活跃度评分以及包含了相关的人物实体的微博条数,从而计算出用户对人物实体的关注度,并将具有较高关注度的人物实体作为热点人物构建出用户的热点人物兴趣图谱。此外,该方法还可以用来提取整个局部社交网络中被关注的热点人物。最后,本文通过对比实验的方式,比较了不同的基于链路的节点相似度计算方法与本文改进的算法在精确度、召回率、F值以及时间效率上的差异,并且分别在基于不同的链路预测算法所构建出的目标用户社交图谱中提取用户关注的热点人物。最终实验证明,本文改进的节点评分计算方法较其他方法来说有较高的精确度、召回率、F值,此外本文提出的隐式和显式热点人物实体提取方法能够有效地挖掘出用户所关注的热点人物,并且其精确度取决于用户社交图谱的精确度。
[Abstract]:With the advent of the Internet era, the network has gradually integrated into people's life. Many netizens have been shopping, making friends, learning and other daily activities through the Internet. It has become a very important part of people's life. In people's Internet life, network social platforms, such as Sina micro-blog, Tencent micro-blog, Twitter and so on, have already become There are many more active sites for Internet users. People can make new friends in these social platforms and share with other users the words, pictures and videos that they are interested in, and the information that is published by the user reflects the behavior habits and interests of the users to some extent. It is short, large and real-time, so mining effective information from mass social data is a major challenge in the field of data mining. Facing a large number of social platform user data, building user's social network and interest atlas is the key to improve the quality of social networks. The main contents of this paper are as follows: 1. in this paper, based on the idea of link prediction (Link Prediction), and by improving the Friend Link (FL) algorithm, a prediction algorithm for active friends (Active Friend Prediction, AFP) is proposed. In order to apply to micro-blog, it is sparse. In this paper, the online social platform of user attribute information is used to abstract the user's online social network into a directed graph (the node represents the user and the side represents the relationship between the users). The similarity between the users is analyzed by the local link characteristics of the graph. The concept of the node activity coefficient is proposed in this paper, that is, the output of each node is used. The ratio is used to characterize the activity of the node, and then the active users are screened from the user's social network graph, and the link structure similarity between the nodes of the social network graph is used to calculate the activity score of the node, thus extracting the active indirect relationship with the user. Neighbors, and use these nodes to build a user's high activity local social network, that is, the user's social map.2. proposed the implicit and explicit hot spot extraction algorithm (Focusing Personae Extraction algorithm, FPE). Micro-blog is a social platform with short text as the information carrier, although micro-blog text packet The text contains a lot of noise information, which is always full of the user's attention. Therefore, this article extracts the entity from the micro-blog in the user and its social atlas, according to the user's activity score in the target user's social map and the micro-blog number that contains the related entity. In addition, this method can also be used to extract hot people who are concerned in the whole local social network. Finally, this paper compares the different methods of the experiment to compare the different kinds of hot spots in the whole local social network. The link based method of node similarity calculation is different from the improved algorithm in accuracy, recall, F value and time efficiency, and extracts the hotspots of the user's attention in the target user social map based on the different link prediction algorithms. The final experiment proves that the improved node score is improved. The calculation method has higher accuracy, recall and F value than other methods. In addition, the implicit and explicit hot spot entity extraction methods proposed in this paper can effectively excavate the hot spots of the user's attention, and its accuracy depends on the accuracy of the user's social map.
【学位授予单位】:西华大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP393.092;TP391.1
【相似文献】
相关期刊论文 前10条
1 ;基于位置的手机社交网络“贝多”正式发布[J];中国新通信;2008年06期
2 曹增辉;;社交网络更偏向于用户工具[J];信息网络;2009年11期
3 ;美国:印刷企业青睐社交网络营销新方式[J];中国包装工业;2010年Z1期
4 李智惠;柳承烨;;韩国移动社交网络服务的类型分析与促进方案[J];现代传播(中国传媒大学学报);2010年08期
5 贾富;;改变一切的社交网络[J];互联网天地;2011年04期
6 谭拯;;社交网络:连接与发现[J];广东通信技术;2011年07期
7 陈一舟;;社交网络的发展趋势[J];传媒;2011年12期
8 殷乐;;全球社交网络新态势及文化影响[J];新闻与写作;2012年01期
9 许丽;;社交网络:孤独年代的集体狂欢[J];上海信息化;2012年09期
10 李玲丽;吴新年;;科研社交网络的发展现状及趋势分析[J];图书馆学研究;2013年01期
相关会议论文 前10条
1 赵云龙;李艳兵;;社交网络用户的人格预测与关系强度研究[A];第七届(2012)中国管理学年会商务智能分会场论文集(选编)[C];2012年
2 宫广宇;李开军;;对社交网络中信息传播的分析和思考——以人人网为例[A];首届华中地区新闻与传播学科研究生学术论坛获奖论文[C];2010年
3 杨子鹏;乔丽娟;王梦思;杨雪迎;孟子冰;张禹;;社交网络与大学生焦虑缓解[A];心理学与创新能力提升——第十六届全国心理学学术会议论文集[C];2013年
4 毕雪梅;;体育虚拟社区中的体育社交网络解析[A];第九届全国体育科学大会论文摘要汇编(4)[C];2011年
5 杜p,
本文编号:2073371
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2073371.html