当前位置:主页 > 科技论文 > 软件论文 >

基于多维特征的微博用户兴趣建模

发布时间:2018-09-09 18:18
【摘要】:近年来,互联网经历了十分快速的发展,并且随着各种移动智能设备的普及,以推特、微博为代表的社交网络方兴未艾。社交网络作为网络信息流动的载体,本身具有方便快捷,短小灵活的特点,它通过“关注”和“粉丝”来连接庞大的用户群体,通过“转发”和“评论”来让更多的人参与到信息的传播过程中。社交网络给人们的交流和获取信息的方式带来了巨大的影响,但同时它也有着自身的局限,那就是“信息过载”的现象,用户在面对过于庞杂的信息时往往不能有效甄别出哪些是对自己有用的信息,这不利于信息的扩散。为了解决这个问题就要求社交网络的平台能够更加了解用户,能够对用户的兴趣偏好进行准确全面的建模,从而为各种个性化服务打下坚实的基础。基于此背景,本文以新浪微博的用户为研究对象,研究了多维度层次化的用户建模方法,多维度指的是尽可能去覆盖能够描述用户的特征,层次化指的是将这些特征梳理关系,形成层级结构,减轻耦合,使得模型具有可扩展性。在论文中主要完成了以下方面的工作:1.微博爬取系统的设计。实现比较高效的爬取,处理和存储流程;2.用户节点的甄别。包括采用Page-rank算法寻找重要用户节点和利用活跃度计算判断活跃用户节点两个方面;3.对短文本的建模。为了克服短文本长度较短,用词不规律,噪声较多的问题,引入主题模型,训练带有主题信息的段落向量,将用户微博表示为连续值的向量;4.构建多维度层次化模型。分别构建模型中的各个部分,计算时对各个部分的相似度结果进行加权求和,并将模型放在用户好友推荐场景中进行试验。
[Abstract]:In recent years, the Internet has experienced a very rapid development, and with the popularity of various mobile smart devices, Twitter, Weibo as the representative of the social network is in the ascendant. As the carrier of network information flow, social network has its own characteristics of convenience, short and flexible. It connects a large number of users through "attention" and "fans". More people are involved in the dissemination of information through "retweets" and "comments". Social networks have had a great impact on the way people communicate and get information, but it also has its own limitations, that is, the phenomenon of "information overload". Users are often unable to identify which information is useful to them when they are faced with information which is too complex, which is not conducive to the spread of information. In order to solve this problem, the platform of social network is required to understand the user better, and to model the user's interest preference accurately and comprehensively, so as to lay a solid foundation for all kinds of personalized services. Based on this background, this paper takes the user of Sina Weibo as the research object, studies the multi-dimensional hierarchical user modeling method. The multi-dimension means to cover the features that can describe the user as much as possible, and the hierarchical refers to the combing of these features. The hierarchical structure is formed, and the coupling is reduced, which makes the model extensible. In this paper, I have done the following work: 1. Weibo crawled the design of the system. To achieve a more efficient crawling, processing and storage process. User node discrimination. Page-rank algorithm is used to find the important user nodes and the active user nodes are judged by the calculation of the activity degree. Modeling of short text. In order to overcome the problems of short text length, irregular use of words and more noise, a theme model is introduced to train paragraph vectors with subject information, and user Weibo is expressed as a vector with continuous values. Build a multi-dimensional hierarchical model. Each part of the model is constructed, and the similarity results of each part are calculated by weighted summation, and the model is tested in the user friend recommendation scenario.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1;TP393.092

【参考文献】

相关期刊论文 前5条

1 黄倩;谢颖华;;一种基于网页浏览行为的用户兴趣度计算方法[J];信息技术;2015年05期

2 吴渝;马璐璐;林茂;刘洪涛;;基于用户影响力的意见领袖发现算法[J];小型微型计算机系统;2015年03期

3 王玉珍;;基于Web挖掘的数字图书馆个性化服务体系研究[J];情报科学;2014年04期

4 朱郭峰;杨彦;周竹荣;应中运;韩凤娇;;基于领域的微博用户影响力计算方法[J];西南大学学报(自然科学版);2014年03期

5 齐向华;文本信息检索模型[J];晋图学刊;1998年03期



本文编号:2233188

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2233188.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户ee7fc***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com