基于网页兴趣度的用户兴趣模型体系研究
发布时间:2018-03-02 03:22
本文关键词: 用户兴趣模型 兴趣度 空间向量模型 时间分段 时间衰减 出处:《复旦大学》2012年硕士论文 论文类型:学位论文
【摘要】:进入Web2.0时代,博客、SNS、微博、轻博客、问答等新型互联网应用形式不断涌现,互联网上的信息量呈现了爆炸式的增长。相比之下,用户在特定时间感兴趣的内容相对有限,往往感兴趣的内容会被淹没在信息汪洋之中。搜索引擎是目前帮助用户找寻信息的最常用方法,它主要是通过用户输入的关键词进行字符匹配再配合一些优化算法来实现信息筛选。自从亚马逊的商品推荐服务推出带来了巨大成功之后,信息筛选的研究重点被逐渐拓展到信息的智能推送上来。如何从海量数据中挖掘出用户感兴趣的内容,从而实现智能的个性化推荐服务,逐渐成为了学术界和IT业界研究的热门课题。 用户兴趣模型是实现内容智能推荐的方式之一。它是指对于用户不同兴趣点的数学表示模型,通过分析用户的访问内容和浏览行为,提取出内容特征和用户对内容的感兴趣程度(Interest Rate,简称IR),进而建立得到。兴趣模型建立之后,将现有内容与用户的兴趣模型进行比对,推荐与用户兴趣匹配程度最高的内容,实现内容的智能推荐。在内容特征提取方面,本文采用向量空间模型(Vector Space Model,简称VSM)来表征文章。在兴趣度评价方面,本文提出了一种综合时间度量的用户行为评估算法,使得提取得到的用户兴趣更加贴近真实情况。在用户模型的更新方面,很多基于VSM的用户兴趣模型研究者忽视了用户兴趣的漂移问题,对用户不同时间的兴趣不加区分,导致无法快速发现用户的兴趣变化,使得模型无法准确反映用户的最新兴趣;同时缺少更新机制,每次兴趣模型更新都需要对所有用户浏览记录进行统计,计算量庞大,数据存储代价高昂,这些都不便于兴趣模型的长期实际应用。针对这些问题,本文对以往的用户兴趣模型进行了优化,引入兴趣的时间分段机制和时间衰减机制以提高系统整体性能。 本文基于用户兴趣模型的理论研究建立了一套兴趣模型系统,采集了来自新浪门户下世博、曼联两个主题的2000篇文章来形成文章内容库。在系统运行过程中,持续收集用户的浏览操作、分析浏览行为、更新用户兴趣模型,最终根据兴趣模型给用户推送感兴趣的内容。经过观察和实验,系统能很好地体现出用户兴趣的变化,并且具有良好的性能稳定性,证明了本文提出的兴趣模型体系的正确性和有效性。
[Abstract]:In the age of Web2.0, new forms of Internet applications, such as blog snaps, Weibo, light blogs, questions and answers, are emerging, and the amount of information on the Internet is exploding. In contrast, the content of interest to users at a given time is relatively limited. Often the content of interest will be submerged in the information Wang Yang. Search engine is currently the most common way to help users find information. It mainly uses the key words input by the user to match the characters and some optimized algorithms to filter the information. Since the launch of Amazon's product recommendation service, it has brought great success. The research focus of information screening has been gradually extended to the intelligent push of information. How to mine the contents of interest to users from the massive data, so as to realize the intelligent personalized recommendation service, It has gradually become a hot topic in academia and IT industry. User interest model is one of the ways to realize content intelligent recommendation. It refers to the mathematical representation model for different points of interest of the user, by analyzing the user's access content and browsing behavior. The content features and the degree of interest of the user to the content are extracted, and then the interest model is established. After the interest model is established, the existing content is compared with the user's interest model, and the content with the highest matching degree with the user's interest is recommended. In the aspect of content feature extraction, this paper uses vector space model (VSM) to represent the article. In the aspect of interest evaluation, this paper proposes a new algorithm of user behavior evaluation, which synthesizes time measurement. In the aspect of user model updating, many researchers based on VSM ignore the drift of user interest, and do not distinguish the interest of user at different time. The change of user's interest can not be found quickly, and the model can not accurately reflect the user's latest interest. At the same time, there is a lack of updating mechanism. Every update of interest model requires statistics of all users' browsing records, and the amount of calculation is huge. Data storage is expensive, which is not convenient for long-term practical application of interest model. In view of these problems, this paper optimizes the previous user interest model. In order to improve the overall performance of the system, an interest time segmentation mechanism and a time attenuation mechanism are introduced. Based on the theoretical research of user interest model, this paper establishes a set of interest model system, collects 2000 articles from the World Expo under Sina Portal and Manchester United to form the article content library. The user's browsing operation is continuously collected, the browsing behavior is analyzed, the user's interest model is updated, and the user's interesting content is pushed according to the interest model. Through observation and experiment, the system can well reflect the change of user's interest. And it has good performance stability, which proves the correctness and validity of the proposed interest model system.
【学位授予单位】:复旦大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.1
【参考文献】
相关期刊论文 前10条
1 李伟超;付永华;;一种改进的基于浏览行为的用户兴趣模型[J];电信科学;2011年05期
2 孙铁利;刘延吉;;中文分词技术的研究现状与困难[J];信息技术;2009年07期
3 李峰;裴军;游之洋;;基于隐式反馈的自适应用户兴趣模型[J];计算机工程与应用;2008年09期
4 刘遥峰;王志良;王传经;;中文分词和词性标注模型[J];计算机工程;2010年04期
5 吴泓润;许斐;李申展;;个性化推荐系统中用户兴趣模型的研究[J];科技信息;2011年19期
6 黄震华;向阳;张波;王栋;刘啸岭;;一种进行K-Means聚类的有效方法[J];模式识别与人工智能;2010年04期
7 朱yN;和莉;王小军;;基于关联反馈技术的用户兴趣模型的建立与自适应更新[J];金陵科技学院学报;2011年04期
8 冯书晓,徐新,杨春梅;国内中文分词技术研究新进展[J];情报杂志;2002年11期
9 张艳;;个性化用户兴趣模型的研究[J];软件导刊;2011年12期
10 曾春,邢春晓,周立柱;个性化服务技术综述[J];软件学报;2002年10期
相关硕士学位论文 前1条
1 曹卫峰;中文分词关键技术研究[D];南京理工大学;2009年
,本文编号:1554813
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1554813.html