基于改进随机森林的推荐算法研究

发布时间：2018-07-09 15:59

本文选题：随机森林 + 推荐算法　；参考：《沈阳理工大学》2017年硕士论文

【摘要】：随着社会经济的发展,电子商务已经成为生活中不可缺少的一部分。面对电子商务中信息呈几何级数式增长,用户很难在海量的商品信息中快速准确的找到自己感兴趣的商品。个性化推荐算法就是这样的背景下所创建。推荐算法改变了电子商务中从被动接收用户请求到主动为其推荐的方式,同时也为用户解决了从信息过载的网络中找到自己喜欢物品的捷径。本文使用的是基于改进随机森林模型的推荐算法。随机森林算法是一种包含多个决策树分类器的统计学习理论,采用了特征子空间来构建模型,能较好的处理噪声且避免发生过拟合。本文针对几种典型的决策森林算法,阐述了其原理和算法的特点,并从决策森林的构建过程出发,提出了一种改进随机森林方法。本文提出一种支持向量机和随机森林算法融合的改进随机森林算法。随机森林中基本弱分类器是决策树,而决策树在进行节点分裂是选择分类能力最强的某个属性。本文在决策树的属性选择中结合支持向量机算法,以特征变量的线性组合(支持向量)构成的超平面进行分裂,比单一属性的分类能力更强,从而在随机森林决策树的建造过程中得到了改进。通过实验分析,充分说明了改进随机森林算法具有更高的准确率。本论文使用的是阿里巴巴线上的真实用户历史行为数据,通过挖掘用户行为建立改进随机森林算法模型,最终得到了为用户推荐商品列表。实验表明,在对用户历史行为数据的前提下,可以有效地对用户未来购买商品进行预测和推荐,对推荐算法发展具有重要意义。
[Abstract]:With the development of social economy, electronic commerce has become an indispensable part of life. In the face of the geometric growth of information in e-commerce, it is difficult for users to quickly and accurately find the goods they are interested in the mass of commodity information. Personalized recommendation algorithm is created in this context. Recommendation algorithm not only changes the way from passively receiving users' requests to actively recommending them in electronic commerce, but also solves the shortcut for users to find their favorite items from the information overload network. This paper uses a recommendation algorithm based on improved stochastic forest model. Stochastic forest algorithm is a kind of statistical learning theory which includes multiple decision tree classifiers. It uses feature subspace to construct the model, which can deal with noise and avoid over-fitting. In this paper, the principle and characteristics of several typical decision making forest algorithms are described, and an improved stochastic forest method is proposed based on the construction process of decision forest. In this paper, an improved stochastic forest algorithm based on support vector machine (SVM) and stochastic forest algorithm is proposed. The basic weak classifier is the decision tree in random forest, and the decision tree is the most powerful attribute in the node splitting. In this paper, the support vector machine (SVM) algorithm is combined in attribute selection of decision tree to split the hyperplane composed of linear combination of feature variables (support vector), which is stronger than the classification ability of single attribute. Therefore, the construction process of random forest decision tree is improved. The experimental results show that the improved stochastic forest algorithm has higher accuracy. This paper uses the real user history behavior data on the Alibaba line. By mining the user behavior, the improved stochastic forest algorithm model is established, and finally the list of recommended items for the user is obtained. The experimental results show that under the premise of user's historical behavior data, it can effectively predict and recommend the products purchased by users in the future, which is of great significance to the development of recommendation algorithm.
【学位授予单位】：沈阳理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【参考文献】