基于物流数据的个性化推荐系统的研究与实现

发布时间：2018-06-05 15:21

本文选题：网络爬虫 + 领域特征值　；参考：《南京邮电大学》2017年硕士论文

【摘要】：随着互联网技术的日渐发展与成熟,网上购物已经成为现代的主要购物方式之一,随之带来的是用户量与商品量的激增,产生了海量的物流信息数据。在这种大数据背景下,由于物流资源与用户需求信息不对等的原因,导致物流网络运输成本过高、物流资源调度不合理以及企业物流决策不及时等问题。传统的物流数据处理模式无法准确预测用户的需求,企业也就无法提前做好物流计划。而个性化推荐可通过数据挖掘技术从物流数据中挖掘用户的偏好信息,根据相似度计算找出用户感兴趣的物品,实现为用户的精确推荐,为企业决策提供有效的数据支撑。目前,个性化推荐系统的主要考虑到以下几个问题:一是如何在海量数据中挖掘用户信息,全面反映用户真实的偏好信息;二是如何利用得到的偏好数据集训练得到有效的偏好模型;三是选择合适的推荐算法。为了更好地研究和实现基于物流数据的个性化推荐系统,本文以开发的网上书店的书籍销售系统为基础重点研究和实现其上的推荐系统,并解决传统书籍推荐系统存在的数据稀疏性、冷启动以及可扩展性等问题。论文中提出了一种基于协同过滤的领域特征值感知推荐方法(Domain Features-Aware Recommendation Method,DFAR)。我们使用那些暗示用户偏好的物品的特征值去间接的挖掘用户的信息,利用现有工具自动提取物品领域特征值,并通过多属性决策方法层次分析法(Analytic Hierarchy Process,AHP)去优化构建用户偏好模型,最后将用户偏好模型与协同过滤算法绑定产生推荐结果。通过仿真实验,结果表明我们的方法能够有效的提取物品领域特征值,缓解数据稀疏性与冷启动问题,很大程度上提高了推荐的精确度。同时,结合Hadoop平台,在Hadoop平台研究实现个性化书籍推荐系统。面对海量数据,利用MapReduce并行化计算框架,实现一个分布式并行化网络爬虫。由于传统的推荐算法在数据处理与计算上时间成本过高,实现基于Hadoop的并行化DFAR推荐方法,这大大的提高了算法的效率,满足用户的需求。最后,结合实际的应用场景与分析,对基于Hadoop的并行化DFAR推荐方法进行了性能上的分析,并通过Java Web开发技术,设计实现了一个书籍推荐系统。
[Abstract]:With the development and maturity of Internet technology, online shopping has become one of the main shopping methods in modern times. Under the background of big data, due to the unequal information between logistics resources and users' needs, the transportation cost of logistics network is too high, the scheduling of logistics resources is unreasonable, and the enterprise logistics decision is not timely. Traditional logistics data processing model can not accurately predict the needs of users, enterprises can not do a good job of logistics planning. The personalized recommendation can mine the user's preference information from the logistics data through data mining technology, find out the objects of interest to the user according to the similarity calculation, realize the accurate recommendation for the user, and provide the effective data support for the enterprise decision-making. At present, the main considerations of personalized recommendation system are as follows: first, how to mine user information in mass data to reflect the real preferences of users; The second is how to use the obtained preference data set to train the effective preference model and the third is to select the appropriate recommendation algorithm. In order to better research and implement the personalized recommendation system based on logistics data, this paper focuses on the research and implementation of the recommendation system based on the book sales system of online bookstores. It also solves the problems of data sparsity, cold start and expansibility in the traditional book recommendation system. In this paper, a domain Features-Aware Recommendation method based on collaborative filtering is proposed. We use the eigenvalues of items that imply user preferences to indirectly mine user information, and use existing tools to automatically extract the feature values of the item domain. Finally, the user preference model is optimized by the Analytic Hierarchy process Analysis (AHP) method. Finally, the user preference model is bound to the collaborative filtering algorithm to produce the recommended results. The simulation results show that the proposed method can extract the feature values of the object domain effectively, alleviate the problem of data sparsity and cold start, and improve the accuracy of the recommendation to a great extent. At the same time, combined with Hadoop platform, a personalized book recommendation system is developed on Hadoop platform. In the face of massive data, a distributed parallel network crawler is implemented by using MapReduce parallel computing framework. Because the time cost of the traditional recommendation algorithm in data processing and computing is too high, the parallel DFAR recommendation method based on Hadoop is implemented, which greatly improves the efficiency of the algorithm and meets the needs of users. Finally, the performance of parallel DFAR recommendation method based on Hadoop is analyzed, and a book recommendation system is designed and implemented by Java Web development technology.
【学位授予单位】：南京邮电大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【参考文献】