当前位置:主页 > 管理论文 > 移动网络论文 >

基于WEB的中文社交网站用户属性推测的研究与分析

发布时间:2019-03-19 10:32
【摘要】:随着互联网的发展,社交网站日益普及。这些网站的用户每天会产生海量数据,用户数据潜藏着巨大的价值。由于用户数据往往涉及到个人隐私,他们通常选择不填写或填写虚假信息等方式来隐藏其个人信息,这导致用户的部分有价值的属性信息难以直接获取。如何推测用户的这些属性信息已经成为当下研究的热门课题。本文主要以新浪微博用户为研究对象,对用户的属性进行推测。主要包括用户的性别推测、年龄分布推测以及教育程度分布推测。本文主要工作如下:1)对于中文用户的性别推测,本文提出了四个基于文本的用户性别推测算法。它们分别是基于昵称的用户性别推测算法(GIABON)、基于标签的用户性别推测算法(GIABOL)、基于微博文本的用户性别推测算法(GIABOWT)、基于均值的用户性别推测算法(GIABOM)。前三个算法只考虑了单个属性对于用户性别推测的影响,这其实是有局限性的,而GIABOM综合考虑了各种类型的文本对于用户性别推测的影响。实验表明,GIABOM的准确率达到85.55%,远高于其它三个算法。这说明在进行用户性别推测时,综合考虑一些属性更加合理。2)对于中文用户的年龄分布推测,本文提出了基于遗传算法优化支持向量机组合参数和特征属性的用户年龄分布推测算法。本文分别取线性核函数、径向基核函数(RBF)、以及基于遗传算法优化参数的RBF作为SVM算法的核函数。实验表明,使用线性核函数的SVM算法的准确率可以达到75.38%,使用RBF的SVM算法的准确率可以达到86.14%。而基于遗传算法优化支持向量机组合参数和特征属性的用户年龄分布推测算法的准确率可以达到89.11%。实验结果验证了该算法对于SVM参数和特征优化的有效性与合理性。3)对于中文用户的教育程度分布推测,本文提出了基于遗传算法优化支持向量机组合参数和特征属性的用户教育程度分布推测算法。其思路同中文用户的年龄分布推测算法类似。实验表明,使用线性核函数的SVM算法的准确率达到81.38%,使用RBF的SVM算法的准确率达到92.14%,基于遗传算法优化支持向量机组合参数和特征属性的用户教育程度分布推测算法的准确率达到93.03%。这说明该算法在推测用户的教育程度方面依然有很好的效果。
[Abstract]:With the development of the Internet, social networking sites are becoming more and more popular. Users of these sites generate huge amounts of data every day, and the user data lurks a great deal of value. Because user data often involves personal privacy, they usually choose not to fill in or fill in false information to hide their personal information, which makes it difficult to obtain some valuable attribute information directly. How to infer the user's attribute information has become a hot topic in the current research. This article mainly takes Sina Weibo user as the research object, carries on the conjecture to the user's attribute. It mainly includes users' sex speculation, age distribution theory and education level distribution theory. The main work of this paper is as follows: 1) for the gender estimation of Chinese users, four text-based gender inference algorithms are proposed in this paper. They are nicknames based on the user gender inference algorithm (GIABON), tag-based user gender inference algorithm (GIABOL), based on Weibo text user gender inference algorithm (GIABOWT), mean-based user gender inference algorithm (GIABOM). The first three algorithms only consider the impact of a single attribute on the user's gender conjecture, which is actually limited, while GIABOM takes into account the effects of various types of texts on the user's gender conjecture. Experimental results show that the accuracy of GIABOM is 85.55%, which is much higher than the other three algorithms. This shows that it is more reasonable to consider some attributes in the user's gender estimation. 2) the age distribution of Chinese users is estimated. In this paper, a user age distribution estimation algorithm based on genetic algorithm is proposed to optimize the combination parameters and characteristic attributes of support vector machines (SVM). In this paper, linear kernel function, radial basis kernel function (RBF),) and genetic algorithm-based optimization parameter (RBF) are used as kernel functions of SVM algorithm respectively. Experiments show that the accuracy of SVM algorithm using linear kernel function can reach 75.38%, and the accuracy of SVM algorithm using RBF can reach 86.14%. Based on the genetic algorithm, the accuracy of the user age distribution estimation algorithm based on the combination parameters and characteristic attributes of support vector machine can reach 89.11%. The experimental results verify the validity and rationality of the proposed algorithm for the optimization of SVM parameters and features. 3) the educational level distribution of Chinese users is speculated. In this paper, a genetic algorithm based on genetic algorithm to optimize the combination parameters and characteristic attributes of support vector machines (SVM) is proposed to predict the user's educational level distribution. The idea is similar to the age distribution estimation algorithm for Chinese users. Experiments show that the accuracy of SVM algorithm using linear kernel function is 81.38%, and that of SVM algorithm using RBF is 92.14%. Based on genetic algorithm, the accuracy of the user education degree distribution prediction algorithm based on the combination parameters and feature attributes of support vector machine is 93.03%. This shows that the algorithm still has a good effect in predicting the education level of users.
【学位授予单位】:南京航空航天大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP18;TP393.09

【参考文献】

相关期刊论文 前2条

1 张磊;陈贞翔;杨波;;社交网络用户的人格分析与预测[J];计算机学报;2014年08期

2 沈翠华,刘广利,邓乃扬;一种改进的支持向量分类方法及其应用[J];计算机工程;2005年08期

相关会议论文 前1条

1 赵云龙;李艳兵;;社交网络用户的人格预测与关系强度研究[A];第七届(2012)中国管理学年会商务智能分会场论文集(选编)[C];2012年

相关博士学位论文 前1条

1 万怀宇;社会网络中基于链接的分类问题研究[D];北京交通大学;2012年

相关硕士学位论文 前4条

1 张晓;社会网络上的用户属性推测方法研究[D];哈尔滨工业大学;2015年

2 夏勇;基于手机应用日志的用户基础属性预测[D];电子科技大学;2015年

3 许盛伍;在线热点新闻推荐系统研究和实现[D];南京航空航天大学;2015年

4 寿泉;在线网络用户作者身份鉴定方法研究[D];南京航空航天大学;2012年



本文编号:2443437

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2443437.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户2cc0a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com