基于在线评论的个性化推荐研究

发布时间：2018-03-18 01:13

本文选题：在线评论　切入点：LDA主题模型　出处：《南京财经大学》2016年硕士论文　论文类型：学位论文

【摘要】：伴随着互联网时代的飞速前进,我们的周围充斥着信息量巨大的网络信息,而这些信息也在生活中扮演着日趋重要的角色。尤其是在电子商务领域,人们每天都要进行购物消费,产生了大量的产品信息和评论信息。如果能够从海量的文字信息中获取有价值的内容,就可以极大地提升消费者的购物体验,促进商品成交率。这非但是在学术领域,而且也在商业应用方面掀起了一股研究的热潮。推荐系统通过探索用户在过去发生的行为数据,以及这些行为和产品自身属性之间的相关性,实现模型的建立,达到用已发生的行为来预测未来行为的目的。简单地说,在实际应用中,就是通过推荐用户可能出现兴趣点的各类产品,来实现业务量的增长。以往的推荐系统主要将重心放在基于内容的推荐方法上,将其他产品和用户曾经购买或选择过的产品进行属性特征的对比,若相似程度较高则予以推荐。本文在此基础上,不仅考虑了产品本身的描述属性,又综合考虑了评分和评论等信息,提高了推荐的准确率。本文首先需要利用网络爬虫对产品信息进行采集,并将采集到的评论文本进行分词等预处理工作,经过预处理后的词语就构成了一个词典集合。由于特征词数量庞大,本文运用了改进的LDA主题模型进行特征提取,结合TF-IDF计算,综合选取不同粒度下的特征,挖掘主题信息,计算出文本在各个主题上的概率分布和权重。最后,本文结合用户兴趣模型,使用sigmoid函数,改善冷启动环境下产品相似度计算时从属性特征到评论特征的过渡,采用欧几里得距离公式对各文本之间的相似度进行计算,将相似度较高的产品作为推荐列表输出并进行推荐。本文将亚马逊中文网站上的图书信息作为实验数据进行实验分析,本文在实验的过程中还讨论了当主题数量发生变化时,对于文本在主题上的概率分布的影响。另外,本文对选取不同特征项以及采用不同特征提取方法的推荐性能指标进行了评价,主要包括准确率、召回率以及F-Measure指标。在对实验结果分别观察后可以看出,与传统的推荐方法相比较而言,本文选用的方法在考虑了评论文本信息并改进后,推荐效果更为准确。
[Abstract]:With the rapid advance of the Internet era, we are surrounded by huge amount of information, which plays an increasingly important role in life, especially in the field of electronic commerce. People buy and consume every day, producing a lot of product information and comment information. If we can get valuable content from the huge amount of text information, we can greatly enhance the shopping experience of consumers. This is not only in the academic field, but also in the commercial application of a research boom. Recommendation system by exploring user behavior data in the past, And the correlation between these behaviors and the properties of the product itself, so that the model can be built to predict the future behavior with the behavior that has occurred. It is to achieve the growth of business volume by recommending various kinds of products where users may have a point of interest. In the past, recommendation systems mainly focused on content-based recommendation methods. Comparing the attribute characteristics of other products with the products that the user has purchased or selected, if the degree of similarity is high, we recommend them. On this basis, we not only consider the description attribute of the product itself, In this paper, we first need to use web crawler to collect product information, and preprocess the collected comment text, such as word segmentation, etc, in order to improve the accuracy of recommendation. Because of the large number of feature words, the improved LDA topic model is used to extract features, combined with TF-IDF calculation, the features of different granularity are selected synthetically, and the topic information is mined. Finally, combining with user interest model and using sigmoid function, we improve the transition from attribute feature to comment feature in product similarity calculation in cold start environment. The Euclidean distance formula is used to calculate the similarity between different texts, and the products with high similarity are output and recommended as the recommended list. In this paper, the book information on Amazon Chinese website is used as experimental data for experimental analysis. In the course of the experiment, we also discuss the influence of the number of topics on the probability distribution of the text on the topic. In this paper, we evaluate the performance index of selecting different feature items and adopting different feature extraction methods, including accuracy, recall rate and F-Measure index. Compared with the traditional recommendation method, the method proposed in this paper is more accurate after considering the text information and improving it.
【学位授予单位】：南京财经大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP391.3;F713.36;F274

【参考文献】