面向社会化媒体用户评论行为的属性推断

发布时间：2018-01-11 23:36

本文关键词：面向社会化媒体用户评论行为的属性推断　出处：《山东大学》2017年硕士论文　论文类型：学位论文

【摘要】：社会化媒体平台是指为用户提供评论、投票、反馈、分享等功能的在线媒体,像凤凰网等新闻网站、亚马逊和淘宝等电商网站、豆瓣等电影评论网站。用户网络评论是社会舆论的一种表现形式,具有公开性和可用性特点,群体意见为其他用户在决定购买产品或使用服务的时候提供了参考。理解用户评论行为进行属性推断,可以帮助企业、机构、政府等提高服务质量,用于个性化推荐、市场营销等,具有重要应用价值。然而社会化媒体用户多为匿名身份,其评论行为数据具有碎片化、信息价值含量低和不平衡的特点,且用户群体的属性分布严重不均衡,这些问题给用户属性推断带来挑战。针对用户评论行为数据分布不平衡、噪音和碎片化的问题,本文引入客体信息、环境信息,作为对用户评论行为数量少的补充信息,辅助用户特征建模;结合基于语义知识库的层次化语义建模方法和基于词向量模型学习词向量的文本挖掘方法深度挖掘用户评论,分别从全局的角度和局部的角度消除词语歧义带来的不良影响,并保留评论中潜在的语义关系,从而达到深度挖掘用户评论潜在的语义特征的目的。针对建模后的用户特征维度大,碎片化数据价值含量低等问题,基于信息增益度量特征重要性,提出两种代表性概率特征筛选算法的改进策略:概率包裹式特征选择算法和启发式概率特征搜索算法,分别在分类学习前和迭代式学习过程中进行概率特征选择,既保留了重要特征信息,也给低价值特征提供小概率选择机会,筛选密切相关特征,以降低搜索空间,提高收敛速度和学习效果。针对用户属性不均衡问题,提出了面向小比例类型数据的差异性特征选择和迭代式增强学习算法,集成多个特征相关的分类器,考虑不同特征组合和分类器适用性的同时,使得集成之后的分类器注重更容易分错的小比例类型数据的学习,能够有效提高用户属性分类学习的准确率。分别使用真实的中文和英文数据集验证本文方法,包括不同的行为建模方式和特征筛选方法对属性推断的影响,以及不同参数和用户属性分布不平衡问题对属性推断的影响,并和其他方法进行了对比,实验结果表明本文方法的有效性。
[Abstract]:Social media platforms are online media that provide comments, voting, feedback, sharing and other functions to users, news sites like Phoenix New Media, e-commerce sites such as Amazon and Taobao. Douban and other movie review sites. User network review is a form of public opinion, with the characteristics of openness and usability. Group views provide reference for other users when deciding to buy products or use services. Understanding user comment behavior and attribute inference can help enterprises, institutions, governments and so on to improve the quality of service. For personalized recommendation, marketing and so on, it has important application value. However, social media users are mostly anonymous identity, its comment behavior data has the characteristics of fragmentation, low information value content and imbalance. And the distribution of user group attributes is seriously uneven, which brings challenges to user attribute inference. In view of the imbalance of user comment behavior data distribution, noise and fragmentation, this paper introduces object information. Environmental information, as the supplementary information to the user's comment behavior, helps to model the user's characteristics. Combining the hierarchical semantic modeling method based on semantic knowledge base and the text mining method based on word vector model learning word vector, deeply mining user comments. From a global perspective and a local perspective to eliminate the negative impact of word ambiguity and retain the potential semantic relationship in the commentary. In order to achieve the purpose of mining the potential semantic features of user comments in depth, aiming at the problems of large dimension of user features and low value content of fragmented data after modeling, the importance of feature is measured based on information gain. This paper proposes two improved strategies of representative probability feature selection algorithm: probabilistic parcels feature selection algorithm and heuristic probabilistic feature search algorithm. Probabilistic feature selection is carried out before and during iterative learning, which not only preserves important feature information, but also provides small probability selection opportunities for low-value features and screening closely related features. In order to reduce the search space, improve the convergence speed and learning effect. Aiming at the problem of user attribute imbalance, this paper proposes a new feature selection and iterative reinforcement learning algorithm for small scale data. By integrating multiple feature related classifiers and considering the applicability of different feature combinations and classifiers, the ensemble classifiers focus on the learning of small scale data that are more easily error-separated. It can effectively improve the accuracy of user attribute classification learning. The real Chinese and English datasets are used to verify the effects of different behavioral modeling methods and feature filtering methods on attribute inference. The effect of different parameters and the imbalance of user attribute distribution on attribute inference is also discussed and compared with other methods. The experimental results show that the proposed method is effective.
【学位授予单位】：山东大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.1

【相似文献】