基于隐私保护的推荐算法研究

发布时间：2018-05-27 08:14

本文选题：协同过滤 + 隐私保护　；参考：《北京交通大学》2017年硕士论文

【摘要】：推荐系统是应用在电子商务系统中的一门非常成功的技术,它能有效缓解由于互联网飞速发展所带来的信息超载问题,并根据人们的行为、偏好等特点从海量数据中挖掘用户的潜在需求,为用户提供个性化的推荐服务。协同过滤算法作为其中最为广泛应用的一类算法,它的基本思想是根据和目标用户具有相同爱好的用户的偏好为目标用户提供预测。然而,协同过滤算法也易受到恶意用户的攻击,常见的攻击模型有托攻击模型和kNN攻击模型。在托攻击模型中,攻击者会构造一定数目的概貌特征接近真实用户的虚假用户来扰乱推荐算法的预测,提高或是降低某些商品的预测评分;在kNN攻击模型中,攻击者会构造一定数目的概貌特征和目标用户接近的虚假用户来获取用户的隐私信息。不论哪一种攻击,都将损害用户的切身利益,使得用户丧失对推荐系统的信任。因此推荐算法的隐私保护问题成为当今的研究热点。本文分别针对推荐算法中常见的托攻击模型和kNN攻击模型展开深入研究,提出解决方案,主要研究成果如下:第一,针对托攻击模型的实现方式以及攻击特点,对当前存在的抵抗托攻击的主要算法展开深入研究。当前的解决方案主要是攻击检测方法和鲁壮性的协同过滤算法,为解决这些算法中的假正率较高、预测不准确等缺点,本文提出一种软决策处理方法,首先应用支持向量机方法获取每个用户可疑程度,然后构建选择邻居的变长分区,最后在保证给定的安全度量标准的前提下,选择与目标用户最相似的邻居。该方法通过标记可疑用户而不是直接删除他们,能有效的使被错误判断为虚假用户的正常用户在相似性计算中做出贡献,进而降低假正率。实验结果表明,该算法在抵抗托攻击时能取得较优异的预测准确性。第二,针对kNN攻击模型的特点,对当前存在的隐私保护协同过滤算法展开研究。当前的隐私保护协同过滤算法主要以加密方法、随机扰乱方法、模糊处理方法为主,针对这些方法的计算成本高、数据实用性低、噪音量级难调节等缺点,本文主要研究k-匿名方法在隐私保护协同过滤算法中的应用,结合推荐算法数据集的数据特点,提出一种新的匿名化准则应用到推荐算法中,该方法基于重要性划分改进微聚集算法来提高匿名化后等价类中用户间的同质性,以达到较好的数据实用性效果;并提出(p,l)-多样性和(p,l,α)-多样性模型增加用户间的差异性,提升用户敏感数据的隐私保护水平,其中p指攻击者所拥有的背景知识,l和(l,α)指用户间的多样性。实验结果表明,该算法能在较低信息损失量的前提下确保较高隐私保护水平。
[Abstract]:Recommendation system is a very successful technology applied in electronic commerce system. It can effectively alleviate the problem of information overload caused by the rapid development of the Internet, and according to the behavior of people, Preferences and other features mine the potential needs of users from massive data and provide personalized recommendation services for users. As one of the most widely used collaborative filtering algorithms, the basic idea of collaborative filtering algorithm is to provide prediction for target users based on the preferences of users with the same interests as the target users. However, collaborative filtering algorithms are also vulnerable to malicious users. The common attack models include trust attack model and kNN attack model. In the trust attack model, the attacker will construct a certain number of false users whose profile features are close to real users to disrupt the prediction of the recommendation algorithm and improve or lower the prediction score of some items; in the kNN attack model, An attacker will construct a certain number of profile features and a false user close to the target user to obtain user privacy information. Either attack will damage the interests of the user and make the user lose trust in the recommendation system. Therefore, the privacy protection of recommendation algorithms has become a hot topic. In this paper, we have carried out in-depth research on the common proxy attack model and kNN attack model in recommendation algorithm, and put forward the solutions. The main research results are as follows: first, aiming at the implementation mode and attack characteristics of the proxy attack model, The main algorithms of resisting trust attack are studied deeply. The current solutions are mainly attack detection methods and robust collaborative filtering algorithms. In order to solve the shortcomings of these algorithms, such as high false positive rate and inaccurate prediction, a soft decision processing method is proposed in this paper. Firstly, support vector machine (SVM) is applied to obtain the suspicious degree of each user, then the variable length partition of selecting neighbor is constructed. Finally, the neighbor that is the most similar to the target user is selected on the premise of guaranteeing the given security metric. By tagging suspicious users instead of deleting them directly, the method can effectively make the normal users who are wrongly judged as false users contribute to the similarity calculation, and then reduce the false positive rate. The experimental results show that the algorithm can achieve better prediction accuracy when resisting the support attack. Secondly, according to the characteristics of kNN attack model, the existing privacy protection collaborative filtering algorithm is studied. The current privacy protection collaborative filtering algorithms mainly include encryption method, random scrambling method and fuzzy processing method, aiming at the disadvantages of these methods, such as high calculation cost, low practicability of data, difficult to adjust the noise level, and so on. This paper mainly studies the application of k-anonymity method in privacy protection collaborative filtering algorithm. Considering the data characteristics of recommendation algorithm data set, a new anonymous criterion is proposed to apply to recommendation algorithm. This method improves the homogeneity among users in the anonymous equivalent class based on the importance partition and improves the homogeneity of the users, so as to achieve better data practicability, and proposes a model to increase the difference between users. In order to improve the privacy protection level of user sensitive data, p refers to the diversity of users with the background knowledge of attackers. Experimental results show that the proposed algorithm can ensure a high level of privacy protection under the premise of low information loss.
【学位授予单位】：北京交通大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【参考文献】