基于社交网络的用户性格与行为分析
发布时间:2018-04-13 17:23
本文选题:社交网络 + 文本挖掘 ; 参考:《北京邮电大学》2014年硕士论文
【摘要】:近两年社交网络如雨后春笋般出现,国内比较知名的有人人网、微博、QQ空间等,国外则有Facebook、Twitter等。社交网络越来越多的改变着人们的生活方式以及社交方式,人们逐渐接受并习惯在社交网络上发照片、写日志、更新状态。另一方面,用户在社交网络上的表现也趋于差异化,例如,有些人喜欢只浏览而不发表内容,另一些人喜欢发表日志却很少发照片。用户的这些行为并不是杂乱无章的,而是蕴含着一定的规律,如何有效的分析用户的行为,挖掘行为背后的深层次规律,进而给用户提供个性化的服务成为一大难点。目前,基于社交网络的用户行为分析主要集中在用户的行为数据上,没有充分挖掘用户在社交网络内发表的文本内容,如用户的状态和日志等,另一方面,目前的用户分析也没有涉及到用户的性格模型,如果能找到用户的性格和行为之间的内在联系,必然能够为社交网络的用户分析提供新的理论支持。本文的工作主要包括以下几个方面: 1.分析方法的确定。首先探讨了目前国内社交网络的发展以及社交网络用户数据的获取,然后以人人网为研究对象,选取了通过构建人人网站内应用的方式获取用户的数据,站内应用的形式为基于人人网的在线性格测试。 2.站内应用的构建。性格测试的题目选择了大五性格测试量表,利用正态分布,将每种性格成分的成绩平均分成五个档次,根据用户所在档次对用户进行测试反馈,并采用Flex前端、Java后台、MySQL数据库技术进行实现。该站内应用通过OAuth认证获得用户的授权,然后通过API读取用户的个人资料及UGC数据。 3.用户数据的处理。对站内应用记录的用户的个人资料、UGC数据进行量化,得到用户的行为统计数据,主要包括用户发表状态、日志,或者分享日志、相册等的频率。此外,对用户的UGC进行语义分析,首先对用户的状态、日志等进行分词以及词频统计、然后对不同的词进行权重调整,最后利用主成分分析简化所得数据。 基于以上步骤得到的用户的行为数据以及语义数据,应用线性回归和决策树算法,对用户的性别、年龄、性格成分进行预测,将预测结果与已知记录进行比较,验证算法的有效性。
[Abstract]:Social networks have sprung up in the past two years, with Renren, Weibo and QQ spaces in China and Facebook Twitter in foreign countries.More and more social networks are changing the way people live and socialize, and people gradually accept and become accustomed to posting photos, writing logs, and updating their status on social networks.On the other hand, users tend to behave differently on social networks. For example, some people like to browse rather than publish content, others like to post blogs but rarely post photos.These behaviors of users are not random, but contain certain laws. How to effectively analyze the behavior of users, excavate the deep rules behind the behavior, and then provide personalized services to users becomes a big difficulty.At present, the social network-based user behavior analysis mainly focuses on the user's behavior data, does not fully excavate the text content that the user publishes in the social network, such as user's status and the log, on the other hand,The current user analysis also does not involve the user's personality model. If we can find the internal relationship between the user's personality and behavior, it will provide a new theoretical support for the social network user analysis.The work of this paper mainly includes the following aspects:1.Determination of analytical methods.This paper first discusses the development of social network in China and the acquisition of user data of social network, and then takes Renren as the research object, and selects the way to obtain user's data by constructing the application in every website.The application in the station is based on the online personality test of Renren.2.Construction of in-station applications.The title of the personality test was the Big five Personality Test scale. By using the normal distribution, the scores of each personality component were divided into five grades, and the users were tested and feedback according to the user's grade.And use Flex front end Java backstage to carry on the implementation of MySQL database technology.The application obtains the user's authorization through OAuth authentication, and then reads the user's personal data and UGC data through API.3.User data processing.The user's personal data (UGC) are quantified and the user behavior statistics are obtained, including the frequency of user's published status, log, sharing log, photo album and so on.In addition, semantic analysis of user's UGC is carried out. First, word segmentation and word frequency statistics are carried out on user's state, log and so on, then the weight of different words is adjusted, and the data is simplified by principal component analysis (PCA).Based on the user behavior data and semantic data obtained from the above steps, linear regression and decision tree algorithm are used to predict the gender, age and personality of the user, and the predicted results are compared with the known records.The validity of the algorithm is verified.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP391.1;TP393.09
【参考文献】
相关期刊论文 前1条
1 肖冬平,梁臣;社会网络研究的理论模式综述[J];广西社会科学;2003年12期
,本文编号:1745473
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1745473.html