融合在线用户评论的协同过滤推荐研究

发布时间：2018-02-03 07:13

本文关键词： 协同过滤用户评论主题模型融合策略　出处：《华南理工大学》2016年硕士论文　论文类型：学位论文

【摘要】：随着互联网的迅猛发展,网络上与商品信息相关的数据量急剧增长,商品发展呈现多样化、品种多、类目繁杂等特点,互联网开始进入大数据时代,而由于“信息过载”问题的存在,用户无法快速、准确地定位到自己感兴趣的产品。在此背景下,个性化推荐系统应运而生,它通过获取用户个性化的需求和特征,在合适的场景给用户推送合适的服务,引导用户便捷地寻找到所需的信息,从而很好地解决“信息过载”的问题。个性化推荐技术广泛应用在电子商务、广告投放、移动平台等领域上,其中在诸多实现推荐的算法中,协同过滤的推荐算法得到的研究最多、应用最为广泛。但考虑到该算法面临的数据稀疏问题,以及其仅仅关注用户发表对商品的评分,忽略了用户发布的具有高价值的商品评论信息,本文提出一种融合用户评论的协同过滤推荐算法,在传统的评分数据上融合用户评论文本信息,通过应用LDA(Latent Dirichlet Allocation)主题模型及Rocchio算法挖掘用户发表的评论文本信息,并考虑到用户对显著主题的关注差异,实现对用户偏好建模,在此基础上提出相似度融合和评分融合两种融合策略以及静态加权和动态加权两种加权策略实现评论文本和评分数据的结合,得到最终的综合推荐结果。由于在对用户评论文本进行建模时使用的是用户所有评论文本,不再是仅仅利用共同评分项目的数据,因而能够极大地缓和了评分数据稀疏的问题,同时将传统协同过滤和用户文本主题偏好信息相结合可计算得到更为精准的用户近邻,为用户产生更准确的推荐。最后,针对本文提出的融合算法,选取公开中、英文数据集及相应效果评估指标,设计对比实验验证融合算法的有效性。实验结果表明:本文提出的融合算法能显著提高传统协同过滤算法的推荐效果,并且在提升效果上相似度融合策略比评分融合策略优秀,动态加权策略比静态加权策略更能显著地提高推荐效果,同时抽取显著的LDA主题再进行加权融合的思路可进一步提高推荐效果。
[Abstract]:With the rapid development of the Internet, the amount of data related to the commodity information on the network has increased dramatically, the commodity development is diversified, the variety, the category is complicated and so on, the Internet begins to enter the big data era. However, due to the existence of "information overload", users can not quickly and accurately locate the products they are interested in. Under this background, personalized recommendation system emerges as the times require. It provides users with personalized needs and features, pushes the right services to the users in the right scene, and guides the users to find the information they need conveniently. Personalized recommendation technology is widely used in e-commerce, advertising, mobile platform and other fields, among which in many algorithms to implement recommendations. Collaborative filtering recommendation algorithm is the most widely studied and widely used. However, considering the data sparsity problem faced by the algorithm, and it only pays attention to the rating of the product published by the user. Ignoring the high value commodity comment information released by users, this paper proposes a collaborative filtering recommendation algorithm which integrates user comments, and integrates user comment text information on traditional rating data. By applying LDA(Latent Dirichlet allocation) topic model and Rocchio algorithm to mine the comment text information published by users. Considering the difference of the user's attention to the obvious theme, the model of user preference is realized. On this basis, two fusion strategies, similarity fusion and score fusion, as well as static weighting and dynamic weighting, are proposed to realize the combination of comment text and rating data. Get the final comprehensive recommendation result. Because the user comments text modeling is all user comments text, it is no longer just using the data of common rating items. Therefore, the problem of sparse scoring data can be greatly alleviated. At the same time, a more accurate user neighbor can be obtained by combining traditional collaborative filtering with user text topic preference information. Finally, for the fusion algorithm proposed in this paper, the open Chinese, English data sets and the corresponding evaluation indicators are selected. Experimental results show that the proposed fusion algorithm can significantly improve the recommendation effect of the traditional collaborative filtering algorithm. And the similarity fusion strategy is better than the score fusion strategy in improving the effect, and the dynamic weighting strategy can significantly improve the recommendation effect than the static weighting strategy. At the same time, the idea of extracting significant LDA themes and weighted fusion can further improve the recommendation effect.
【学位授予单位】：华南理工大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：F724.6

【参考文献】