推荐系统中基于相似性计算的协同过滤算法研究

发布时间：2018-06-28 06:16

本文选题：协同过滤 + 稀疏性　；参考：《郑州大学》2017年硕士论文

【摘要】：信息技术的快速发展,互联网用户量和信息量规模的不断扩大,网络空间的信息生产者由传统媒体逐步趋向大众化和平台化。随着个人影响力的提升和人工智能的普及发展,人人都是信息的生产者,任何同网络传输相连接的设备也是信息制造者。大规模信息量的产生在为用户提供充足信息的同时也带来相当大的困扰,即如何从庞大繁琐的数据空间获取需要的信息内容,该现象即是信息过载。推荐系统的出现,有效地缓解了过载问题。其中的协同过滤推荐技术,利用用户间相关信息,产生合理推荐,并能由反馈信息进行动态调整,突破了专业领域知识的限制,提高了推荐性能。但同时协同过滤也面临着一些问题,由于用户规模和项目数量的不断增大,不可避免地会出现维度灾难的问题,该问题具体会反映在数据稀疏性上,如何有效地利用已存在的数据信息,为解决推荐系统所面临问题的关键。针对稀疏问题,本文由相似性出发展开研究。第一,在用户对应的项目领域,根据项目属性进行用户分类,得到用户初始兴趣分布。在用户所具有的兴趣分布特征基础上进行数据信息量化处理,获取用户本身兴趣特征,通过计算用户的兴趣特征差异,经过数量关系调整后得到新的相似性,最终在协同框架下进行预测评分数值计算,最终完成推荐过程。第二,根据用户评分结构所具有的隶属特点,以及用户间评分体系所体现的差异性,分别得到具有用户评分结构性和用户评分混乱性特点的相似性计算方法。在有限的用户数据空间,通过将两方面特性进行有效融合,形成新的相似性计算方法。最后,由兴趣分布和评分结构出发得到的相似性方法,在协同过滤过程中能更好地得到用户间相似关系。实验验证对比显示,在新的相似计算方法下,推荐算法能够缓解稀疏状况,降低推荐误差。
[Abstract]:With the rapid development of information technology, the number of Internet users and the scale of information are expanding, and the information producers in cyberspace are gradually becoming popular and platform from traditional media. With the improvement of personal influence and the popularization of artificial intelligence, everyone is the producer of information, and any equipment connected to the network transmission is also the information maker. The production of large-scale information not only provides sufficient information for users, but also brings considerable trouble, that is, how to obtain the required information content from the huge and cumbersome data space, which is called information overload. The appearance of recommendation system effectively alleviates the overload problem. The collaborative filtering recommendation technology can make use of relevant information among users to produce reasonable recommendation and can be dynamically adjusted by feedback information. It breaks through the limitation of professional domain knowledge and improves the performance of recommendation. But at the same time, collaborative filtering also faces some problems. Due to the increasing number of users and projects, the problem of dimensionality disaster will inevitably occur, which will be reflected in the data sparsity. How to utilize the existing data information effectively is the key to solve the problem of recommendation system. To solve the sparse problem, this paper starts with similarity. First, in the corresponding project domain, the user is classified according to the item attribute, and the initial interest distribution is obtained. On the basis of the interest distribution characteristics of users, the quantization of data information is carried out to obtain the interest characteristics of users themselves, and the new similarity is obtained by calculating the differences of interest characteristics of users and adjusting the quantitative relationship. Finally, the prediction score numerical calculation is carried out under the collaborative framework, and the recommendation process is finally completed. Secondly, according to the membership characteristics of the user scoring structure and the difference of the scoring system between users, the similarity calculation method with the characteristics of the user scoring structure and the user rating confusion is obtained respectively. In the limited user data space, a new similarity calculation method is formed by combining the two features effectively. Finally, the similarity method based on interest distribution and scoring structure can better obtain the similarity relationship between users in the process of collaborative filtering. The experimental results show that the proposed algorithm can reduce the sparse state and the recommendation error under the new similarity calculation method.
【学位授予单位】：郑州大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【相似文献】