融合上下文信息的混合协同过滤推荐算法研究
本文关键词: 推荐系统 协同过滤 矩阵分解 层次分类 内容关联 迁移学习 出处:《北京交通大学》2016年博士论文 论文类型:学位论文
【摘要】:随着计算机的普及和网络技术的发展,互联网信息服务已经逐渐渗透到人们生活的方方面面,正在从根本上改变人们传统的生活方式。特别是近年来,智能手机、平板电脑等移动设备的广泛使用以及微信、微博等移动应用的兴起,突破了传统PC端互联网访问的时间、空间等限制,使得人们现在可以更加方便、自由、快捷地通过互联网获取和分享信息。然而,伴随着互联网信息服务的蓬勃发展,其信息资源规模也发生了爆发式增长。此时,人们从互联网中找到自己想要的信息变的愈发困难,引起了所谓的信息过载问题。在此背景下,推荐系统被提出并且成为解决该问题最有效的技术之一。目前,协同过滤是推荐系统中应用最广泛、最成功的技术。它仅需少量“用户-物品”之间的历史评分数据就可以快速构建一个可用的系统来预测用户的潜在信息需求,具有简单、易用、精度高等优点。然而,随着数据规模越来越庞大、数据类型越来越丰富、应用环境越来越复杂,传统协同过滤算法正面临更加严峻的数据稀疏性、冷启动、可扩展性、可解释性等问题。最近,一些研究工作尝试把上下文信息融合到协同过滤算法,取得了一定的性能提升。从这些初步尝试可以看出,上下文信息与用户兴趣有紧密联系,它们的引入有助于提高预测精度和用户满意度,因此融合上下文信息对于改进协同过滤算法具有重要意义。鉴于此,本文对协同过滤算法进行了系统分析,对上下文信息进行了更加深入的探讨,进而针对不同上下文的历史评分数据,设计了多种混合协同过滤算法能够更高效地利用上下文信息解决当前推荐系统面临的问题。本文主要工作和创新如下:1.融合物品分类结构和内容信息的协同过滤算法研究。目前,大部分关于可扩展性和冷启动问题的研究主要针对用户进行展开,而很少关注系统中动态更新的物品,尤其对大规模物品缺乏可扩展性,对新物品也不能取得令人满意的推荐结果。本研究发现,在有明确物品分类的前提下,同种物品之间一定会存在一些相同的内容属性或者其他一些潜在特征,因此用户对同种物品应该具有相似兴趣。基于此发现,本研究从物品关系以及物品特征入手,利用物品分类信息、物品内容信息(关键字)等上下文提出一种逐步优化用户兴趣的分层协同过滤算法。分析显示该算法对大规模物品有可扩展性,还能解决新物品的冷启动问题,并且真实数据集上的实验结果表明该算法在不同比例稀疏数据情况下可以取得较高的预测精度,而且针对新物品具有较好的冷启动预测能力。2.融合用户-物品内容上下文关联信息的协同过滤算法研究。在之前的算法中,虽然物品分类信息有助于利用物品相似性优化用户兴趣,但是分类需要事先构建,这种较高的数据要求限制了该算法的适用范围,另外该算法不能对用户进行扩展,也不能解决新用户的冷启动问题。为了设计更通用可扩展的算法,本研究转而关注内容上下文,也就是用户内容信息(标签)和物品内容信息(关键字)。用户-物品之间的历史评分数据为它们的内容上下文建立了关联关系。基于此发现,本研究从内容上下文入手,将协同过滤与基于内容的推荐算法相结合,提出一种根据内容相似性产生预测结果的间接协同过滤算法。分析显示该算法具有较强的可解释性和可扩展性,并且真实数据集上的实验结果表明该算法在不同比例稀疏数据情况下可以取得较高的预测精度,而且针对新用户和新物品都具有较好的冷启动预测能力。3.融合子群组间潜在共享信息的协同过滤算法研究。除了直接将上下文信息与推荐算法进行耦合外,最近出现了一类基于子群组的改进算法,主要思想是根据上下文信息,将整个数据集划分到不同子群组,然后在这些子群组上分别运行协同过滤算法产生各自的预测结果。但是不均衡稀疏数据会造成子群组上协同过滤结果不稳定的问题。对这些子群组分析后,可以发现它们所包含的用户和物品之间存在隐含联系。基于此发现,本研究从子群组间潜在共享信息入手,提出一种基于知识迁移的跨群组协同过滤算法,它利用少数性能较好子群组上的协同过滤结果构建评分矩阵的多个近似,然后加权聚合这些近似产生预测结果。分析显示该算法减少了一些性能较差子群组上的不必要计算,而且真实数据集上的实验结果表明该算法提高了预测精度,尤其是在非常稀疏数据上其性能提升尤为明显,说明该算法缓解了数据稀疏性问题。
[Abstract]:With the development of computer and network technology, the Internet information service has gradually penetrated into all aspects of people's lives, is fundamentally changing people's traditional way of life. Especially in recent years, intelligent mobile phone, tablet computer and other mobile devices are widely used as well as WeChat, micro-blog and other emerging mobile applications, through access the traditional PC Internet time, space constraints, so that people can now be more convenient, free, fast access and share information through the Internet. However, with the rapid development of Internet information service, the information resource scale has undergone explosive growth. At this time, people from the Internet to find the information they want to change more difficult, cause the information overload problem. In this context, recommendation systems have been proposed to solve the problem of technology and become the most effective operation at present, Collaborative filtering is the most widely used recommendation system, the most successful technology. It only need a small amount of "user item" between the historical rating data can quickly build an available system to predict the user's potential information demand, has the advantages of simple, easy to use, high precision. However, with the increasing scale of data big data types are more and more abundant, the application environment is more and more complex, the traditional collaborative filtering algorithm is facing more severe data sparsity, cold start, scalability and interpretability. Recently, some studies try to put context information into collaborative filtering algorithm, has made a big performance improvement from these. A preliminary attempt can be seen, is closely related to context and user interest, which is helpful to improve the prediction accuracy and user satisfaction, so the fusion of context information to improve collaboration The filter algorithm has important significance. In view of this, this paper makes a systematic analysis on the collaborative filtering algorithm, the context information is further discussed, according to the historical data of different context score, design a variety of hybrid collaborative filtering algorithm can be more efficient to solve the current location recommendation system problems with context information. The main work of this paper and the innovation is as follows: collaborative filtering algorithm of 1. fusion category structure and content information. At present, most research on scalability and cold start problem is mainly for users to start, and pay little attention to goods dynamic updating of the system, especially the lack of scalability for large items of new items can not get the recommended results satisfactory. This study found that, in the premise of a clear classification of goods, the same goods between certain there will be some of the same. The attributes of the content or some other potential features, so that users of the same items should have the same interest. Based on these findings, this study from the relationship between items and items of the goods classification information, goods information (key) context we propose a hierarchical user interest gradually optimize the collaborative filtering algorithm. The analysis shows that the algorithm can extended to large items, but also resolves the problem of the cold start of new items, and the experimental results on real datasets show that the algorithm can achieve higher prediction accuracy in different proportion of sparse data, and according to the new cold start items with better prediction ability of collaborative filtering algorithm.2. fusion user context information items. Before the algorithm, although the classification of items of information to help with the similarity optimization of user interest, but is classified Prior to construction, the high data requirements limit the application range of the algorithm, the algorithm cannot the user expansion, can not solve the problem of the cold start of new users. In order to design a more general scalable algorithm, this study focus on the context, which is the user information and content information items (Tags) (key words). The history of user item rating data establish the relationship between their context. Based on these findings, this study from the context of collaborative filtering and recommendation algorithm based on the content of the combination is proposed based on content similarity prediction results. Analysis showed that the indirect collaborative filtering algorithm the algorithm has strong interpretability and scalability, and the experimental results on real datasets show that the algorithm can be taken in different proportion under the condition of sparse data Have higher prediction accuracy, but also for new users and new items have better prediction ability of cold start.3. fusion collaborative filtering algorithm of information sharing between potential sub group. In addition to direct context information and recommendation algorithm coupling, the recent emergence of a kind of improved algorithm based on sub group, the main idea is based on the the context information of the entire data set is divided into different sub groups, then the collaborative filtering algorithm to produce predictive results of their operation in these sub groups. But the imbalance of sparse data will cause the sub group on collaborative filtering unstable results. Analysis of these sub groups, there can be hidden connection between users and items found they contain. Based on these findings, this study from the group of potential information sharing, proposes a cross group collaborative filtering is based on knowledge transfer Method, it uses collaborative filtering results to construct multiple approximate score matrix a better performance on the sub group, and then weighted aggregation these approximations yield prediction results. The analysis shows that the algorithm reduces the number of sub groups of the poor performance of unnecessary calculation, and experimental results on real datasets show that the algorithm improves the prediction accuracy. Especially in very sparse data on its performance is particularly evident, indicating that the algorithm alleviates the problem of data sparsity.
【学位授予单位】:北京交通大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP391.3
【相似文献】
相关期刊论文 前10条
1 杨风召;;一种基于特征表的协同过滤算法[J];计算机工程与应用;2007年06期
2 王岚;翟正军;;基于时间加权的协同过滤算法[J];计算机应用;2007年09期
3 曾子明;张李义;;基于多属性决策和协同过滤的智能导购系统[J];武汉大学学报(工学版);2008年02期
4 张富国;;用户多兴趣下基于信任的协同过滤算法研究[J];小型微型计算机系统;2008年08期
5 侯翠琴;焦李成;张文革;;一种压缩稀疏用户评分矩阵的协同过滤算法[J];西安电子科技大学学报;2009年04期
6 廖新考;;基于用户特征和项目属性的混合协同过滤推荐[J];福建电脑;2010年07期
7 沈磊;周一民;李舟军;;基于心理学模型的协同过滤推荐方法[J];计算机工程;2010年20期
8 徐红;彭黎;郭艾寅;徐云剑;;基于用户多兴趣的协同过滤策略改进研究[J];计算机技术与发展;2011年04期
9 焦晨斌;王世卿;;基于模型填充的混合协同过滤算法[J];微计算机信息;2011年11期
10 郑婕;鲍海琴;;基于协同过滤推荐技术的个性化网络教学平台研究[J];科技风;2012年06期
相关会议论文 前10条
1 沈杰峰;杜亚军;唐俊;;一种基于项目分类的协同过滤算法[A];第二十二届中国数据库学术会议论文集(技术报告篇)[C];2005年
2 周军锋;汤显;郭景峰;;一种优化的协同过滤推荐算法[A];第二十一届中国数据库学术会议论文集(研究报告篇)[C];2004年
3 董全德;;基于双信息源的协同过滤算法研究[A];全国第20届计算机技术与应用学术会议(CACIS·2009)暨全国第1届安全关键技术与应用学术会议论文集(上册)[C];2009年
4 张光卫;康建初;李鹤松;刘常昱;李德毅;;面向场景的协同过滤推荐算法[A];中国系统仿真学会第五次全国会员代表大会暨2006年全国学术年会论文集[C];2006年
5 李建国;姚良超;汤庸;郭欢;;基于认知度的协同过滤推荐算法[A];第26届中国数据库学术会议论文集(B辑)[C];2009年
6 王明文;陶红亮;熊小勇;;双向聚类迭代的协同过滤推荐算法[A];第三届全国信息检索与内容安全学术会议论文集[C];2007年
7 胡必云;李舟军;王君;;基于心理测量学的协同过滤相似度方法(英文)[A];NDBC2010第27届中国数据库学术会议论文集(B辑)[C];2010年
8 林丽冰;师瑞峰;周一民;李月雷;;基于双聚类的协同过滤推荐算法[A];2008'中国信息技术与应用学术论坛论文集(一)[C];2008年
9 罗喜军;王韬丞;杜小勇;刘红岩;何军;;基于类别的推荐——一种解决协同推荐中冷启动问题的方法[A];第二十四届中国数据库学术会议论文集(研究报告篇)[C];2007年
10 黄创光;印鉴;汪静;刘玉葆;王甲海;;不确定近邻的协同过滤推荐算法[A];NDBC2010第27届中国数据库学术会议论文集A辑一[C];2010年
相关博士学位论文 前10条
1 纪科;融合上下文信息的混合协同过滤推荐算法研究[D];北京交通大学;2016年
2 李聪;电子商务推荐系统中协同过滤瓶颈问题研究[D];合肥工业大学;2009年
3 郭艳红;推荐系统的协同过滤算法与应用研究[D];大连理工大学;2008年
4 罗恒;基于协同过滤视角的受限玻尔兹曼机研究[D];上海交通大学;2011年
5 薛福亮;电子商务协同过滤推荐质量影响因素及其改进机制研究[D];天津大学;2012年
6 高e,
本文编号:1442363
本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/1442363.html