大规模个性化在线视频服务的用户隐私保护

发布时间：2018-05-01 22:07

本文选题：隐私推断 + 隐私保护　；参考：《北京交通大学》2016年博士论文

【摘要】：大规模在线视频服务系统不仅占据着主要的网络流量和市场份额,并且在用户数量和有效浏览时间方面持续保持巨大优势。个性化推荐服务成为视频网站主要竞争手段的同时带来了用户隐私泄露风险。一方面,推荐系统可以准确推断出性别、年龄等用户配置文件信息,导致用户隐私泄露。另一方面,伪装成普通用户的攻击者可以直接从推荐系统输出中获取目标用户的历史行为记录,进而推断其敏感兴趣偏好,这种隐秘的非直接访问攻击对用户的隐私造成更严重的威胁。实际上,针对个性化推荐和用户隐私保护这一对矛盾,现有的保护用户隐私的推荐方案普遍在二者之间进行权衡,保护用户隐私会造成推荐性能的损失已成为现有研究工作的共识。对于大规模在线视频服务系统,能否以及如何在保护用户隐私信息的同时保证甚至提高推荐服务的质量,成为目前研究的难点。同时,对于其他大规模在线服务系统而言,这也是一个亟待解决的问题。针对此课题,本文首先分析了大规模在线视频服务系统中用户隐私信息泄露的高风险性,以用户性别信息为例,分析了基于少量浏览记录准确推断用户隐私信息的可能性。然后,针对用户性别、年龄等身份信息和敏感兴趣偏好的保护,分别研究了推荐友好的隐私保护框架和基于话题隐私重要度的差分隐私协同过滤算法,实现了在保护用户隐私的同时保证甚至提高推荐服务质量这一研究目标。本文的主要工作及创新点如下:第一,在用户隐私推断方面,为了解决实际在线视频系统中数据的高稀疏性问题,本文分别针对中英文视频系统提出不同的用户行为汇聚方法。具体地说,针对没有分隔符的亚洲语言设计了简单有效的关键词提取算法,针对英文视频系统基于同义词库提出了可极大保留原始信息的用户行为汇聚方法。为解决用户性别分布失衡问题,本文提出了新的评估测度,并基于此建立了改进的隐私推断模型。基于多个大规模在线视频系统数据集的实验证明,相比已有工作,本文方法不仅能有效解决实际系统中的数据高稀疏性和性别分布失衡问题,而且能使性别推断的结果达到整体最优。这一研究验证了在数据高稀疏性的视频系统中少量数据记录暴露用户隐私信息的可能性。第二,为了在保护年龄、性别等用户隐私信息的同时不损失推荐服务性能,本文提出了推荐友好的隐私保护框架。现有做法是在用户观看记录中加入一定的相反类别用户喜爱视频的虚拟打分,在实现模糊用户信息的同时牺牲了推荐的准确性。这种做法忽略了一个重要事实——就单个用户而言,她(他)可能会喜欢统计意义上最受相反类别(性别或年龄组等)用户欢迎的内容。基于这一观察,本文提出一种新的视频相似度计算方法,设计了既能模糊用户性别(年龄)信息又能强化用户兴趣的视频选择策略及视频虚拟打分估算方法。大量实验证明,相较于已有研究的权衡做法,本文提出的推荐友好的隐私保护框架可以在保护用户性别、年龄等用户信息的同时保证甚至提高推荐服务的质量,并且可以推广到类似的书籍、CD、音乐等推荐系统中。第三,针对典型的用户行为记录非直接访问攻击,本文提出基于话题隐私重要度的差分隐私协同过滤算法。现有的差分隐私协同过滤算法,对用户的不同行为记录提供相同强度的保护,虽然推荐的平均误差性能尚可接受,但就实际系统中普遍采用的Top-k推荐而言,推荐性能严重受损。针对这一问题,基于用户对不同行为记录泄露的敏感程度不同这一观察,进一步结合视频系统用户行为显著的稀疏性特征,本文提出在视频话题级别上实现区别隐私重要度的隐私保护。本文提出了话题隐私重要度参数,在同等隐私保护预算的前提下对高隐私重要度的话题提供更强的保护。为了提高个性化推荐服务质量,进一步在用户端依据用户兴趣偏好对推荐系统输出结果重新排序筛选,实现视频的Top-k推荐。实验证实,在同等隐私保护预算的前提下,基于话题隐私重要度的差分隐私协同过滤算法对用户兴趣偏好提供区别性保护的同时,有效提高了协同过滤系统中Top-k视频推荐的精确率和召回率。
[Abstract]:The large-scale online video service system not only occupies the main network traffic and market share, but also maintains a great advantage in the number of users and the effective browsing time. Personalized recommendation service has become the main competitive means of the video website and brings the risk of user privacy disclosure. On the one hand, the recommendation system can be accurately deduced. On the other hand, an attacker disguised as an ordinary user can obtain the historical behavior records of the target user directly from the output of the recommended system, and then infer their sensitive interest preference. This secret non direct access attack is more serious for the user's privacy. In fact, in view of the contradiction between personalized recommendation and user privacy protection, the existing recommendation schemes to protect user privacy generally weigh between the two, and the protection of user privacy will cause the loss of the recommended performance to be the common understanding of the existing research work. At the same time, it is also a difficult problem to solve the problem of improving the quality of the user's privacy information and even improving the quality of the recommended service. At the same time, for other large-scale online service systems, this is a problem to be solved urgently. This paper first analyzes the high wind of the disclosure of the user privacy information in the large-scale online visual frequency service system. Risk, taking the user sex information as an example, analyzes the possibility of accurate inference of user privacy information based on a small number of browsing records. Then, the friendly privacy protection framework and the differential privacy collaborative filtering based on the importance of topic privacy are respectively studied for the protection of user sex, age and other identity information and sensitive interest preference. The main work and innovation of this paper are as follows: first, in order to solve the problem of the high sparsity of the data in the actual online video system, this paper puts forward different uses for the Chinese and English video systems in order to solve the problem of the high sparsity of the data in the actual online video system. In particular, a simple and effective keyword extraction algorithm is designed for the Asian language without separators. A new method of user behavior convergence is proposed for the English video system based on the synonym library, which can greatly retain the original information. In order to solve the problem of the imbalance of the sex distribution of the users, a new evaluation measure is proposed. Based on this, an improved privacy inference model is established. Based on the experiments of data sets of multiple large-scale online video systems, it is proved that this method can not only effectively solve the problem of high sparsity and gender distribution in the actual system, but also make the result of sex inference achieve the overall optimal. A small amount of data in a video system with high data sparsity exposes the possibility of exposing the user's privacy information. Second, in order to protect the user's privacy information, such as age, sex and other users, the proposed friendly privacy protection framework is proposed. The existing method is to add a certain opposite in the user's viewing record. Class users love the virtual score of video and sacrifice the accuracy of recommendation while realizing fuzzy user information. This approach ignores an important fact - for a single user, she (he) may like the most popular content in the statistical sense of the most contrary category (sex or age group). Based on this observation, this article A new method of video similarity calculation is proposed. A video selection strategy and a video virtual scoring estimation method, which can not only fuzzy user sex (age) information but also enhance user interest, are designed. A large number of experiments show that a friendly privacy protection framework proposed in this paper can be used to protect users compared to the existing trade-offs. Gender, age and other user information guarantee even the quality of the recommended service, and can be extended to similar books, CD, music and other recommendation systems. Third, this paper proposes a differential privacy collaborative filtering algorithm based on the importance of topic privacy. Collaborative filtering algorithm provides the same intensity of protection for user's different behavior records. Although the recommended average error performance is acceptable, the recommended performance is severely damaged in terms of the Top-k recommendation commonly used in the actual system. One step is to combine the prominent sparsity of the user's behavior in video system. This paper proposes the privacy protection that distinguishes the privacy importance at the video topic level. This paper puts forward the parameter of the importance of the topic privacy, and provides more protection to the words with high privacy importance on the premise of the same privacy protection budget. Recommending service quality, further sorting and screening the output results of the recommended system at the user's side according to the user's interest preference, to realize the Top-k recommendation of the video. The experiment confirms that the differential privacy collaborative filtering algorithm based on the importance of the topic privacy provides the discriminative protection for the user's interest preference under the premise of the same privacy protection budget. It effectively improves the accuracy and recall rate of Top-k video recommendation in collaborative filtering system.

【学位授予单位】：北京交通大学
【学位级别】：博士
【学位授予年份】：2016
【分类号】：TP391.3;TP309;TP393.092

【参考文献】