当前位置:主页 > 管理论文 > 营销论文 >

微博用户属性识别方法研究

发布时间:2018-06-27 17:26

  本文选题:微博分析 + 用户属性识别 ; 参考:《苏州大学》2015年硕士论文


【摘要】:随着社交网络的迅猛发展,自动分析社交网络中的有用信息成为目前自然语言处理、社交网络分析等领域的重要研究课题。其中,微博用户属性识别是一项基本任务。该任务旨在根据微博用户产生的相关数据对用户的个体属性(例如,性别、年龄等)进行自动识别。准确识别用户的个体属性,可以帮助更好的进行智能营销、个性化预测及情感分析等研究。本文的研究内容主要包括以下三个方面:首先,针对微博中用户的个人与非个人属性,本文提出了一种结合微博用户的用户名和微博文本两类信息的分类方法。该方法针对两种文本训练不同分类器,并在此基础上提出了一种基于分类器融合的方法,同时利用用户名和微博两类信息进行分类。实验结果表明,本文的方法可以达到较高的识别准确率,并且分类器融合方法明显优于仅利用用户名或微博文本的单分类器分类方法。其次,针对微博用户的性别属性,提出了一种基于交互式信息的半监督性别分类方法。传统的性别分类研究依赖大量的标注样本,而通常情况下人工标注样本费时费力。作为一种社交网络平台,微博提供了多种交互机制以供用户互动。因此,微博平台既包括用户发布的微博等非交互式信息,同时也包括回复等交互式信息。本文提出了一种基于交互式信息的半监督性别分类方法,该方法将交互式和非交互式两类信息作为协同训练算法的两个视图,充分利用未标注样本实现半监督性别分类。实验结果表明基于非交互式和交互式视图的半监督性别分类方法能够有效利用非标注样本提升性别分类性能。最后,针对微博用户的年龄属性,提出了一种基于文本和社交信息的半监督年龄回归方法。该方法通过协同训练算法同时结合用户的文本和社交两类信息,充分利用未标注样本实现半监督年龄回归。此外,我们提出了一种基于QBC的方法,解决了回归问题中样本置信度衡量的难题。实验结果表明,本文提出的基于文本和社交信息的半监督年龄回归方法,在数据平衡和不平衡两种情况下都能有效利用非标注样本提升年龄回归的性能。
[Abstract]:With the rapid development of social networks, automatic analysis of useful information in social networks has become an important research topic in natural language processing, social network analysis and other fields. Among them, Weibo user attribute recognition is a basic task. The task is to automatically identify the user's individual attributes (such as gender, age, etc.) based on the relevant data generated by the Weibo user. Accurate identification of the individual attributes of users can help to better carry out intelligent marketing, personalized prediction and emotional analysis and other research. The research contents of this paper mainly include the following three aspects: firstly, aiming at the personal and non-personal attributes of users in Weibo, this paper proposes a classification method which combines the user name of Weibo user and the Weibo text. This method aims at training different classifiers for two kinds of text, and proposes a method based on classifier fusion. At the same time, two kinds of information, user name and Weibo, are used to classify. The experimental results show that the proposed method can achieve high recognition accuracy and the classifier fusion method is obviously superior to the single classifier classification method which only uses user name or Weibo text. Secondly, a semi-supervised gender classification method based on interactive information is proposed for the gender attributes of Weibo users. Traditional sex classification research relies on a large number of labeled samples, but usually manual labeling of samples takes time and effort. As a social network platform, Weibo provides a variety of interactive mechanisms for user interaction. Therefore, the Weibo platform includes not only non-interactive information such as Weibo published by users, but also interactive information such as replies. In this paper, a semi-supervised gender classification method based on interactive information is proposed. This method takes interactive and non-interactive information as two views of collaborative training algorithm, and realizes semi-supervised sex classification by using unlabeled samples. The experimental results show that the semi-supervised gender classification method based on non-interactive and interactive views can effectively improve the performance of sex classification by using unlabeled samples. Finally, a semi-supervised age regression method based on text and social information is proposed for the age attributes of Weibo users. This method combines the text and social information of users by the cooperative training algorithm, and makes full use of unlabeled samples to realize semi-supervised age regression. In addition, we propose a method based on QBC to solve the problem of sample confidence measurement in regression problem. The experimental results show that the proposed semi-supervised age regression method based on text and social information can effectively improve the performance of age regression by using unlabeled samples under both data balance and imbalance.
【学位授予单位】:苏州大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.1

【参考文献】

相关期刊论文 前2条

1 曹波;苏一丹;邓琦;;基于最大熵模型的中国人名自动识别[J];计算机工程与应用;2009年04期

2 陈鹏;隋晋光;;基于个体属性的微博用户特征行为统计分析[J];知识管理论坛;2013年03期

相关硕士学位论文 前1条

1 王广新;基于微博的用户兴趣分析与个性化信息推荐[D];上海交通大学;2013年



本文编号:2074667

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/yingxiaoguanlilunwen/2074667.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户058f2***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com