基于用户可信度的抗攻击协同过滤算法的研究与应用

发布时间：2018-06-09 14:16

本文选题：波动因子 + 用户可信任度　；参考：《重庆大学》2014年硕士论文

【摘要】：协同过滤是目前个性化推荐系统中应用最为普遍和成熟的技术。协同过滤技术通过分析用户的历史行为记录获取用户之间或者项目之间的相互关系，然后通过该关系为各个用户或者项目找到他们的最近邻居，然后使用这些最近邻居为目标用户推荐相关的项目以达到挖掘用户潜在兴趣的目的。但是该技术仍然存在着数据稀疏性、冷启动、可扩展性和系统脆弱性等问题。本文主要针对数据稀缺性问题以及系统脆弱性问题进行深入研究，并据此改进传统协同过滤算法，使其在数据稀疏的情况下具有更高的推荐精度，并且能够对各种常见的攻击有较好的抵抗能力。首先，本文指出通常只有那些在计算相似度的时候需要使用缺失数据的算法才会直接面临数据稀缺性问题。由于皮尔逊相关相似度的计算并不使用缺失数据，所以在计算皮尔逊相关相似度的时候不需要直接面临数据稀缺的问题，而是面临着共同评分项数目不同的问题（即“波动因子”问题）。本文在引出波动因子的问题后，系统分析了相似度在不同波动因子影响下的分布情况，并根据该分布情况提出一种简单有效的方法消除波动因子对相似度计算的影响。接着，使用消除波动因子影响的协同过滤算法(包括user-based以及item-based协同过滤算法)在Movielens的数据集上进行实验，，实验表明消除波动因子影响的协同过滤算法在推荐精度上较原始算法有较大的提高。然后，针对协同过滤算法易受到攻击的问题，本文从用户可信任度的角度出发，提出一种简单的基于统计的方法来计算各用户的可信任度，并根据用户可信任度加强协同过滤算法的抗攻击能力。接着，本文将改进的消除波动因子影响的相似度计算算法应用到传统协同过滤算法中，并且结合用户可信任度提出了一种抗攻击的协同过滤算法。为了验证改进后算法的推荐能力以及抗攻击能力，在Movielens数据集上进行实验，实验表明改进后的算法在推荐精度以及抗攻击能力上都比原始算法有较大的改进。最后，将本文的研究内容与“第四方就业信息平台”项目相结合进行相关的应用研究。
[Abstract]:Collaborative filtering is the most popular and mature technology in personalized recommendation system. Collaborative filtering obtains the relationships between users or projects by analyzing their historical behavior records, and then finds their nearest neighbors for each user or project. Then, these nearest neighbors are used to recommend related items for the target users for the purpose of mining the potential interests of the users. However, there are still some problems such as data sparsity, cold start, extensibility and system vulnerability. In this paper, the data scarcity problem and the system vulnerability problem are studied in depth, and the traditional collaborative filtering algorithm is improved to make it have higher recommendation accuracy when the data is sparse. And it can resist all kinds of common attacks. Firstly, this paper points out that only those algorithms that need to use missing data to calculate similarity will face the problem of data scarcity directly. Because Pearson correlation similarity does not use missing data, there is no need to directly face the problem of data scarcity in computing Pearson correlation similarity. Rather, they are faced with the problem of different numbers of common scores (i.e., "volatility factor"). After introducing the problem of fluctuation factor, this paper systematically analyzes the distribution of similarity under the influence of different fluctuation factors, and puts forward a simple and effective method to eliminate the influence of fluctuation factor on similarity calculation. Then, the co-filtering algorithm (including user-based and item-based co-filtering algorithm) which eliminates the influence of fluctuation factor is used to carry out experiments on Movielens data set. The experiment shows that the collaborative filtering algorithm which eliminates the influence of fluctuation factor has higher recommendation accuracy than the original one. Then, aiming at the problem that collaborative filtering algorithm is vulnerable to attack, this paper starts from the perspective of user trust. A simple statistical method is proposed to calculate the trust degree of each user, and the anti-attack ability of the collaborative filtering algorithm is enhanced according to the trust degree of the user. In this paper, we apply the improved similarity calculation algorithm to the traditional collaborative filtering algorithm, and propose an anti-attack collaborative filtering algorithm combined with the user's trustworthiness. In order to verify the recommendation ability and anti-attack ability of the improved algorithm, the experiment on Movielens dataset shows that the improved algorithm has better recommendation accuracy and anti-attack ability than the original algorithm. The research content of this paper is combined with the fourth party employment information platform project to carry on the related application research.
【学位授予单位】：重庆大学
【学位级别】：硕士
【学位授予年份】：2014
【分类号】：TP391.3;TP393.08

【参考文献】