基于微博的用户兴趣分析与个性化信息推荐
发布时间:2018-11-07 14:29
【摘要】:在过去的十几年中,互联网上的信息量迅速增加,人们从信息匮乏时代进入了信息过载时代。随之而来的是人们获取信息的方式的转变,从传统的人工寻找,到搜索引擎,再到现在的推荐系统。如何有效地给用户推荐有用的信息,最重要的一个环节就是如何有效地获取用户兴趣。微博等社交网络的出现给我们提供了一个新的分析用户兴趣的巨大的数据源,成为近几年研究的热点。 本文对如何使用微博数据分析用户兴趣,以及进行个性化推荐的方法进行了分析和探索。与现有的工作相比,本文主要有以下几点不同。首先,考虑到每条微博内容都比较短的特点,我们并没有直接在微博数据上使用主题模型,而是使用外部知识库构建主题模型,用以对微博内容进行语义丰富,同时也避免了在微博数据上主题数目不容易确定的问题。其次,我们认为并不是所有微博都是与用户兴趣相关的,也就是所谓的噪音微博,,而这些噪音微博会对模型效果造成影响。因此,我们从多个方面分析了用以识别噪音微博的特征,构建了一个联合分类器过滤掉噪音微博。最后,我们认为用户兴趣是会随时间变化的,提出了时间加权的主题分布来描述用户兴趣。在实验中,我们把我们的算法同非负矩阵分解算法和直接在微博数据上使用主题模型的算法比较。实验结果表明,本文的算法能够更有效地发现用户的实时兴趣。而且,在用户微博数量比较少或者噪音微博比较多的情况下,依然可以有效地分析出用户兴趣。
[Abstract]:In the past ten years, the amount of information on the Internet has increased rapidly, and people have moved from the era of information scarcity to the era of information overload. What follows is the change in the way people obtain information, from traditional manual search to search engine, and then to the present recommendation system. How to effectively recommend useful information to users, the most important link is how to effectively obtain user interest. The emergence of social networks such as Weibo has provided us with a new huge data source for analyzing users' interests, and has become a hot research topic in recent years. This paper analyzes and explores how to use Weibo data to analyze user interest and to carry out personalized recommendation. Compared with the existing work, this paper has the following main differences. First of all, considering that each Weibo content is relatively short, we do not directly use the topic model on Weibo data, but use an external knowledge base to build a topic model, which is used to enrich the semantic content of Weibo. At the same time, it avoids the problem that the number of topics on Weibo's data is not easy to determine. Secondly, we think that not all Weibo is related to user interest, the so-called noise Weibo, which will affect the effect of the model. Therefore, we analyze the features of noise Weibo from several aspects, and construct a combined classifier to filter out the noise Weibo. Finally, we propose a time-weighted topic distribution to describe user interest. In the experiment, our algorithm is compared with the non-negative matrix decomposition algorithm and the algorithm which uses the topic model directly on Weibo data. Experimental results show that the proposed algorithm can more effectively detect the real-time interest of users. Moreover, when the number of users Weibo is relatively small or the noise Weibo is more, user interest can still be effectively analyzed.
【学位授予单位】:上海交通大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP393.092;TP391.3
本文编号:2316647
[Abstract]:In the past ten years, the amount of information on the Internet has increased rapidly, and people have moved from the era of information scarcity to the era of information overload. What follows is the change in the way people obtain information, from traditional manual search to search engine, and then to the present recommendation system. How to effectively recommend useful information to users, the most important link is how to effectively obtain user interest. The emergence of social networks such as Weibo has provided us with a new huge data source for analyzing users' interests, and has become a hot research topic in recent years. This paper analyzes and explores how to use Weibo data to analyze user interest and to carry out personalized recommendation. Compared with the existing work, this paper has the following main differences. First of all, considering that each Weibo content is relatively short, we do not directly use the topic model on Weibo data, but use an external knowledge base to build a topic model, which is used to enrich the semantic content of Weibo. At the same time, it avoids the problem that the number of topics on Weibo's data is not easy to determine. Secondly, we think that not all Weibo is related to user interest, the so-called noise Weibo, which will affect the effect of the model. Therefore, we analyze the features of noise Weibo from several aspects, and construct a combined classifier to filter out the noise Weibo. Finally, we propose a time-weighted topic distribution to describe user interest. In the experiment, our algorithm is compared with the non-negative matrix decomposition algorithm and the algorithm which uses the topic model directly on Weibo data. Experimental results show that the proposed algorithm can more effectively detect the real-time interest of users. Moreover, when the number of users Weibo is relatively small or the noise Weibo is more, user interest can still be effectively analyzed.
【学位授予单位】:上海交通大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP393.092;TP391.3
【参考文献】
相关期刊论文 前1条
1 刘维湘;郑南宁;游屈波;;非负矩阵分解及其在模式识别中的应用[J];科学通报;2006年03期
本文编号:2316647
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2316647.html