融合上下文信息的混合协同过滤推荐算法研究

发布时间：2018-01-19 03:10

本文关键词： 推荐系统协同过滤矩阵分解层次分类内容关联迁移学习　出处：《北京交通大学》2016年博士论文　论文类型：学位论文

【摘要】：随着计算机的普及和网络技术的发展,互联网信息服务已经逐渐渗透到人们生活的方方面面,正在从根本上改变人们传统的生活方式。特别是近年来,智能手机、平板电脑等移动设备的广泛使用以及微信、微博等移动应用的兴起,突破了传统PC端互联网访问的时间、空间等限制,使得人们现在可以更加方便、自由、快捷地通过互联网获取和分享信息。然而,伴随着互联网信息服务的蓬勃发展,其信息资源规模也发生了爆发式增长。此时,人们从互联网中找到自己想要的信息变的愈发困难,引起了所谓的信息过载问题。在此背景下,推荐系统被提出并且成为解决该问题最有效的技术之一。目前,协同过滤是推荐系统中应用最广泛、最成功的技术。它仅需少量“用户-物品”之间的历史评分数据就可以快速构建一个可用的系统来预测用户的潜在信息需求,具有简单、易用、精度高等优点。然而,随着数据规模越来越庞大、数据类型越来越丰富、应用环境越来越复杂,传统协同过滤算法正面临更加严峻的数据稀疏性、冷启动、可扩展性、可解释性等问题。最近,一些研究工作尝试把上下文信息融合到协同过滤算法,取得了一定的性能提升。从这些初步尝试可以看出,上下文信息与用户兴趣有紧密联系,它们的引入有助于提高预测精度和用户满意度,因此融合上下文信息对于改进协同过滤算法具有重要意义。鉴于此,本文对协同过滤算法进行了系统分析,对上下文信息进行了更加深入的探讨,进而针对不同上下文的历史评分数据,设计了多种混合协同过滤算法能够更高效地利用上下文信息解决当前推荐系统面临的问题。本文主要工作和创新如下：1.融合物品分类结构和内容信息的协同过滤算法研究。目前,大部分关于可扩展性和冷启动问题的研究主要针对用户进行展开,而很少关注系统中动态更新的物品,尤其对大规模物品缺乏可扩展性,对新物品也不能取得令人满意的推荐结果。本研究发现,在有明确物品分类的前提下,同种物品之间一定会存在一些相同的内容属性或者其他一些潜在特征,因此用户对同种物品应该具有相似兴趣。基于此发现,本研究从物品关系以及物品特征入手,利用物品分类信息、物品内容信息(关键字)等上下文提出一种逐步优化用户兴趣的分层协同过滤算法。分析显示该算法对大规模物品有可扩展性,还能解决新物品的冷启动问题,并且真实数据集上的实验结果表明该算法在不同比例稀疏数据情况下可以取得较高的预测精度,而且针对新物品具有较好的冷启动预测能力。2.融合用户-物品内容上下文关联信息的协同过滤算法研究。在之前的算法中,虽然物品分类信息有助于利用物品相似性优化用户兴趣,但是分类需要事先构建,这种较高的数据要求限制了该算法的适用范围,另外该算法不能对用户进行扩展,也不能解决新用户的冷启动问题。为了设计更通用可扩展的算法,本研究转而关注内容上下文,也就是用户内容信息(标签)和物品内容信息(关键字)。用户-物品之间的历史评分数据为它们的内容上下文建立了关联关系。基于此发现,本研究从内容上下文入手,将协同过滤与基于内容的推荐算法相结合,提出一种根据内容相似性产生预测结果的间接协同过滤算法。分析显示该算法具有较强的可解释性和可扩展性,并且真实数据集上的实验结果表明该算法在不同比例稀疏数据情况下可以取得较高的预测精度,而且针对新用户和新物品都具有较好的冷启动预测能力。3.融合子群组间潜在共享信息的协同过滤算法研究。除了直接将上下文信息与推荐算法进行耦合外,最近出现了一类基于子群组的改进算法,主要思想是根据上下文信息,将整个数据集划分到不同子群组,然后在这些子群组上分别运行协同过滤算法产生各自的预测结果。但是不均衡稀疏数据会造成子群组上协同过滤结果不稳定的问题。对这些子群组分析后,可以发现它们所包含的用户和物品之间存在隐含联系。基于此发现,本研究从子群组间潜在共享信息入手,提出一种基于知识迁移的跨群组协同过滤算法,它利用少数性能较好子群组上的协同过滤结果构建评分矩阵的多个近似,然后加权聚合这些近似产生预测结果。分析显示该算法减少了一些性能较差子群组上的不必要计算,而且真实数据集上的实验结果表明该算法提高了预测精度,尤其是在非常稀疏数据上其性能提升尤为明显,说明该算法缓解了数据稀疏性问题。
[Abstract]:With the development of computer and network technology, the Internet information service has gradually penetrated into all aspects of people's lives, is fundamentally changing people's traditional way of life. Especially in recent years, intelligent mobile phone, tablet computer and other mobile devices are widely used as well as WeChat, micro-blog and other emerging mobile applications, through access the traditional PC Internet time, space constraints, so that people can now be more convenient, free, fast access and share information through the Internet. However, with the rapid development of Internet information service, the information resource scale has undergone explosive growth. At this time, people from the Internet to find the information they want to change more difficult, cause the information overload problem. In this context, recommendation systems have been proposed to solve the problem of technology and become the most effective operation at present, Collaborative filtering is the most widely used recommendation system, the most successful technology. It only need a small amount of "user item" between the historical rating data can quickly build an available system to predict the user's potential information demand, has the advantages of simple, easy to use, high precision. However, with the increasing scale of data big data types are more and more abundant, the application environment is more and more complex, the traditional collaborative filtering algorithm is facing more severe data sparsity, cold start, scalability and interpretability. Recently, some studies try to put context information into collaborative filtering algorithm, has made a big performance improvement from these. A preliminary attempt can be seen, is closely related to context and user interest, which is helpful to improve the prediction accuracy and user satisfaction, so the fusion of context information to improve collaboration The filter algorithm has important significance. In view of this, this paper makes a systematic analysis on the collaborative filtering algorithm, the context information is further discussed, according to the historical data of different context score, design a variety of hybrid collaborative filtering algorithm can be more efficient to solve the current location recommendation system problems with context information. The main work of this paper and the innovation is as follows: collaborative filtering algorithm of 1. fusion category structure and content information. At present, most research on scalability and cold start problem is mainly for users to start, and pay little attention to goods dynamic updating of the system, especially the lack of scalability for large items of new items can not get the recommended results satisfactory. This study found that, in the premise of a clear classification of goods, the same goods between certain there will be some of the same. The attributes of the content or some other potential features, so that users of the same items should have the same interest. Based on these findings, this study from the relationship between items and items of the goods classification information, goods information (key) context we propose a hierarchical user interest gradually optimize the collaborative filtering algorithm. The analysis shows that the algorithm can extended to large items, but also resolves the problem of the cold start of new items, and the experimental results on real datasets show that the algorithm can achieve higher prediction accuracy in different proportion of sparse data, and according to the new cold start items with better prediction ability of collaborative filtering algorithm.2. fusion user context information items. Before the algorithm, although the classification of items of information to help with the similarity optimization of user interest, but is classified Prior to construction, the high data requirements limit the application range of the algorithm, the algorithm cannot the user expansion, can not solve the problem of the cold start of new users. In order to design a more general scalable algorithm, this study focus on the context, which is the user information and content information items (Tags) (key words). The history of user item rating data establish the relationship between their context. Based on these findings, this study from the context of collaborative filtering and recommendation algorithm based on the content of the combination is proposed based on content similarity prediction results. Analysis showed that the indirect collaborative filtering algorithm the algorithm has strong interpretability and scalability, and the experimental results on real datasets show that the algorithm can be taken in different proportion under the condition of sparse data Have higher prediction accuracy, but also for new users and new items have better prediction ability of cold start.3. fusion collaborative filtering algorithm of information sharing between potential sub group. In addition to direct context information and recommendation algorithm coupling, the recent emergence of a kind of improved algorithm based on sub group, the main idea is based on the the context information of the entire data set is divided into different sub groups, then the collaborative filtering algorithm to produce predictive results of their operation in these sub groups. But the imbalance of sparse data will cause the sub group on collaborative filtering unstable results. Analysis of these sub groups, there can be hidden connection between users and items found they contain. Based on these findings, this study from the group of potential information sharing, proposes a cross group collaborative filtering is based on knowledge transfer Method, it uses collaborative filtering results to construct multiple approximate score matrix a better performance on the sub group, and then weighted aggregation these approximations yield prediction results. The analysis shows that the algorithm reduces the number of sub groups of the poor performance of unnecessary calculation, and experimental results on real datasets show that the algorithm improves the prediction accuracy. Especially in very sparse data on its performance is particularly evident, indicating that the algorithm alleviates the problem of data sparsity.

【学位授予单位】：北京交通大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TP391.3

【相似文献】