基于主题模型的用户兴趣挖掘及上下文感知推荐系统算法研究

发布时间：2018-02-10 03:10

本文关键词： 主题模型上下文感知推荐系统　出处：《山东大学》2017年硕士论文　论文类型：学位论文

【摘要】：互联网技术的发展使得数字信息资源开始呈现几何倍数的增长,反映在智能电视领域,则表现为每日海量视频数据的产生,用户互动行为的多样性,及随之而来的用户行为数量激增。随着国家"三网融合"等战略的推进,与智能电视用户需求的不断扩大,如何处理并有效利用大规模数据已经成为该领域一个亟待解决的难题。由于大数据、搜索引擎、个性化推荐技术的蓬勃发展,这一问题已经受到重视,并逐渐衍生出一些解决方案。目前,个性化推荐系统已经得到了广泛的研究和应用,它能够帮助用户更好的挖掘自身兴趣,协助建立系统的用户画像,有助于维持用户对内容的关注程度,避免相关业务的用户流失。目前推荐系统主要的计算方法分为基于模型的构建和基于邻域的构建。基于模型构建的推荐系统能够准确地表达用户兴趣,在推荐效果上有突出表现,而基于邻域的构建相对来说更加简单易行,且具有良好的可解释性。如何能够有效地结合两种模型的长处共同构建推荐系统模型,是本文的一个研究重点。另外,在针对电视推荐系统的业务逻辑中,由于电视是共享终端,在不同时间上下文情况下表现出的用户兴趣会有较大差异,如何合理引入时间上下文相关概念建模以提升推荐效果,也是本文关注的重点。所以,在本文的研究工作中,我们首先提出一种基于短文本LDA主题模型的推荐算法。该算法是基于模型的推荐挖掘,将文本挖掘领域的潜语义模型应用到推荐系统中,用以准确构建用户的主题兴趣。该算法针对视频推荐系统中用户观看过的视频通常较少这一数据稀疏性问题,做了特殊处理,将原本LDA算法中对视频项进行建模,转变为直接对视频共现对进行建模并采样计算。这一处理极大地解决数据稀疏性问题,并能够有效地提升用户兴趣挖掘的准确度。故我们引入短文本的LDA主题模型,并将用户观看记录转化为低维空间中的两个矩阵,即用户兴趣矩阵(用户-主题),和视频从属度矩阵(主题-视频)。在准确获取到用户兴趣的基础上,为了解决电视共享终端的推荐问题,我们引入时间上下文信息并构建基于用户兴趣的协同过滤推荐算法。该算法首先是一个基于邻域的推荐算法,对具有类似兴趣的用户相互推荐视频,在构建用户兴趣的时候引入前过滤的上下文感知推荐策略,在构建视频共现对这一处理过程中,加入上下文环境约束,只对处于同一个时间上下文环境中的视频集合中的元素构建视频对。这一前过滤策略有效地引入时间上下文信息,能够有效区分不同时间段上的用户兴趣情况,避免将不相关的视频构建成为同一个视频共现对。另外,在召回推荐列表并最终排序的时候,再次引入后过滤的上下文感知推荐策略,为每一个视频在当下环境中是否值得被推荐进行加权,加权的依据则是该用户在当前上下文中的兴趣主题分布。该后过滤方法能够在用户兴趣的基础上针对请求推荐列表的时间上下文做进一步筛选,能够极大提高推荐效果。为了实验验证模型的推荐效果,我们使用国内知名电视推荐平台,海信电视云平台的真实数据集,提供多种对比推荐算法,并在多样的数据评测指标上进行评估。我们的方法在该数据集上取得了较高的召回率及MAP、MRR等指标,明显优于其他传统推荐算法及上下文推荐算法,进而证明了本文方法的有效性。
[Abstract]:The development of Internet technology makes the digital information resources began to multiply, reflected in the field of smart TV, showed the daily massive video data, the diversity of user interaction, the number of user behavior and the subsequent surge. Along with the "triple play" strategy to promote, and TV user needs constantly how to expand, and the effective use of large-scale data processing has become an urgent problem to be solved in this field. Because of the large data, search engine, the vigorous development of personalized recommendation technology, this problem has been paid attention to, and gradually derived some solutions. At present, the personalized recommendation system has been widely studied and applied. Mining can help users improve their interest, help users to establish the system of the portrait, helps to maintain the user attention to content, avoid The loss of related business users. The calculation method of the main recommendation system is divided into model construction based on neighborhood construction. Recommendation system model can accurately express the user interest based on the outstanding performance in the recommended effect, and the neighborhood construction relative to the more simple and based on good explanation. How to effectively combine the two model's strengths to jointly build a recommendation system model is a research focus of this paper. In addition, according to the business logic in the TV recommender system, because the TV is shared terminal, user interest in different time context conditions may be different, how to properly introduce the contextual conceptual modeling to enhance the effectiveness of the recommendation, but also the focus of this article. So, in this research, we first propose a short text based on LDA Recommendation algorithm. The algorithm is a topic model mining model based on the recommendation, the applications of text mining in the field of latent semantic model to the recommendation system, to accurately construct the user interest. The theme of the algorithm for video recommendation system users to watch the video and usually less sparsity of the data, do the special treatment, the original LDA algorithm on video for modeling into direct to video co-occurrence modeling and sampling calculation. This processing greatly solve the problem of data sparsity, and can effectively improve the accuracy of user interest mining. LDA topic model we introduce the short text and user viewing records two matrix into a low dimensional space, i.e. the user interest matrix (user topic), and video subordinate degree matrix (Theme - VIDEO). On the basis of accurate access to the user's interest, to solve the TV Recommended terminal sharing, we introduce time context information and build a collaborative filtering recommendation algorithm based on user interest. The first algorithm is a recommendation algorithm based on neighborhood, are recommended to video users with similar interests, context aware before the introduction of filtering when constructing the user interest recommendation strategy in the construction of the video is now in this process, adding context constraints, only the elements in the same time in the context of the construction of video collection video. The context information before filtering strategy effectively is introduced, which can be used to differentiate the user in different time, avoid not related to video construction one video co-occurrence. In addition, when the recall recommended list and final ranking, the re introduction of context aware after filtering for each recommendation strategy. Whether a video is recommended by weighting in the current environment, weighted is the basis for the user in the current context of the topics of interest distribution. The post filtering method can do a request for a list of recommended time based on user interest in the context of further screening, can greatly improve the recommendation effect. In order to recommend effect experiment model and we use the well-known TV recommendation platform, Hisense TV cloud platform real data sets, provide various contrast recommendation algorithm, and evaluated in various data evaluation index. Our method on the data set has a high recall rate and MAP, MRR and other indicators, significantly better than the other traditional recommendation algorithm and the context recommendation algorithm, and prove the validity of this method.

【学位授予单位】：山东大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【相似文献】