推荐系统中协同过滤算法关键问题研究

发布时间：2018-06-25 03:36

本文选题：推荐系统 + 协同过滤　；参考：《扬州大学》2016年硕士论文

【摘要】：随着Wleb技术在互联网中发展,用户不再是简单地从网络中获取信息,而是采取更加主动的方式产生信息。由于用户数量的急剧增长,以用户为中心的信息产生模式,导致了互联网信息量呈现飞速增长,这种现象被称为“信息过载”。该现象是指在海量信息面前,人们无法迅速准确地获取对他们有用的信息。为了解决“信息过载”问题,推荐系统由此而产生。推荐系统不要求用户提供准确的需求,而是根据对用户的过去行为进行分析,从而推测出用户在将来可能需要的信息。当前,在众多推荐技术中,协同过滤推荐技术由于它独特的优点,在电子商务中取得了广泛应用。虽然协同过滤推荐算法的研究工作已经取得许多成果,但依然存在很多问题亟需解决。比如“冷启动”、“可扩展性”、“数据稀疏性”等问题,这些问题的存在,对算法的准确性造成了影响。如何解决上述问题,改进协同过滤算法性能,一直是推荐系统中重点研究的课题。论文主要工作如下：第一,针对协同过滤技术中存在的“冷启动”、“可扩展性”问题,提出了结合用户属性聚类的协同过滤推荐算法ID-CF。该推荐系统通过加入权重的方法,将基于项目的协同过滤算法与K—means算法相结合,显著提高其推荐准确度。在算法中,由于项目之间的相似性和用户聚类可以离线计算,这样可以解决推荐系统的可扩展性问题。当一个新用户加入系统时,通过使用聚类算法,可将新用户添加到最相近的用户集,这样可以快速预测用户对项目的评分,冷启动问题也可较好地解决。第二,由于“数据稀疏性”问题对协同过滤算法的准确性有较大的影响,提出了一种结合图模型的协同过滤推荐算法NG-CF,该算法提出一种新的相似性度量标准,即用户或者项目之间的相似性,可以通过图中顶点之间的关系来获得,然后使用K-近邻算法产生预测。实验表明, 即使改变数据稀疏性,预测结果也具有较好的稳定性。“冷启动”、“可扩展性”、“数据稀疏性”等问题是协同过滤推荐算法研究的热点问题,论文是在前人的工作的基础上,仅仅做出一些探索和分析,还有许多问题需要改进。
[Abstract]:With the development of Wleb technology in the Internet, users no longer simply get information from the network, but take a more active way to generate information. Due to the rapid growth of the number of users, the user-centered information generation model leads to the rapid growth of Internet information, which is called "information overload". This phenomenon means that in the face of mass information, people can not get useful information quickly and accurately. In order to solve the problem of information overload, recommendation system is produced. Recommendation system does not require the user to provide accurate requirements, but based on the past behavior of the user to analyze, so as to speculate the user may need information in the future. At present, collaborative filtering recommendation technology has been widely used in e-commerce due to its unique advantages among many recommendation technologies. Although many achievements have been made in collaborative filtering recommendation algorithms, there are still many problems to be solved. Such as "cold start", "extensibility", "data sparsity" and other problems, these problems have an impact on the accuracy of the algorithm. How to solve the above problems and improve the performance of collaborative filtering algorithm has been the focus of research in recommendation system. The main work of this paper is as follows: first, aiming at the problems of "cold start" and "expansibility" in collaborative filtering technology, a collaborative filtering recommendation algorithm ID-CFbased on user attribute clustering is proposed. The recommendation system combines the project-based collaborative filtering algorithm with the K-means algorithm by adding weights to improve the accuracy of recommendation. In the algorithm, due to the similarity between items and user clustering can be calculated offline, this can solve the scalability problem of recommendation system. When a new user joins the system, the new user can be added to the closest user set by using clustering algorithm, which can quickly predict the user's score on the item, and the cold start problem can be solved better. Secondly, because the problem of "data sparsity" has great influence on the accuracy of collaborative filtering algorithm, a collaborative filtering recommendation algorithm NG-CFS combining graph model is proposed, which proposes a new similarity measurement standard. In other words, the similarity between users or items can be obtained by the relationship between vertices in the graph, and then the K-nearest neighbor algorithm is used to generate prediction. The experimental results show that the prediction results are stable even if the data sparsity is changed. "Cold start", "expansibility" and "data sparsity" are hot issues in the research of collaborative filtering recommendation algorithm. Based on the previous work, this paper only makes some exploration and analysis, and many problems need to be improved.
【学位授予单位】：扬州大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.3

【参考文献】