LBSN中基于并行图的协同过滤位置推荐算法研究
发布时间:2018-12-17 16:50
【摘要】:在互联网高速发展的今天,推荐系统能够缓解用户筛选感兴趣内容时的困扰,帮助用户发现有价值信息,已成为解决信息过载的有效手段。推荐系统中的协同过滤算法,因其领域无关性及支持用户发现潜在兴趣的优点被广泛应用。随着智能手机和地理位置服务的普及,基于位置的社会化网络(Location-based Social Network,LBSN)被社交网络服务应用商提出并受到大众的欢迎。LBSN可以实时获取用户的地理位置信息,并将在虚拟网络中传播的虚拟信息与用户在真实世界中的位置信息有效结合起来。为了解决LBSN中位置推荐的需求,学术界和工业界将协同过滤算法应用到LBSN的位置推荐中来。LBSN中的位置推荐一方面可以帮助普通用户筛选感兴趣的新地点,另一方面可以协助商家进行自身品牌推广与营销。但是,由于当前LBSN中数据具有规模过大且异构、多维度的特点,使得当前提出的应用于LBSN中的协同过滤位置推荐算法在算法实时性、推荐精确度等方面仍有较大提升空间。具体的,考虑时间、地点的实时位置推荐,本文完成了如下工作:(1)通过建立基于图的评分数据模型,将传统的协同过滤算法与并行图计算框架及改进的K近邻(K-nearest Neighbors,KNN)算法结合,提出了 GK-CF(Graph KNN Collaborative Filtering)算法。通过图的消息传播及改进的相似度计算模型对用户先进行筛选再做相似度计算;以用户-项目二部图的节点结构为基础,通过图的最短路径算法进行待评分项目的快速定位。(2)在GK-CF算法的基础上,结合了 LBSN中的时空信息,进一步提出了 LBSN中结合时空信息的协同过滤位置推荐算法LGP-CF(Location Graph Place Collaborative Filtering)。根据用户签到行为规律,将数据集分片,降低需要计算的数据规模。通过聚类算法获取相似用户集,缩小相似用户集选择范围。将轨迹数据及点数据结合起来进行相似度计算。最后,在根据经纬度信息将位置进行聚类的基础上,快速可靠定位可推荐位置集。(3)通过Spark平台上的GraphX并行图框架对上述算法进行了并行化实现及优化。通过算法流程优化及性能调优,有效的提高了算法的可扩展性和实时性能。在真实的物理集群环境下,对上述算法进行了实验,结果表明,与其他的协同过滤算法相比,在rmse、准确率、召回率等指标上,本文提出的算法显示了很好的推荐准确度和评分预测的准确性,在加速比等指标上也表明本文算法具有较好的可扩展性和实时性能。
[Abstract]:With the rapid development of the Internet, the recommendation system can alleviate the puzzlement of the users when they filter the content of interest, and help the users to find valuable information. It has become an effective means to solve the information overload. Collaborative filtering algorithms in recommendation systems are widely used because of their domain independence and the advantages of supporting users to discover potential interests. With the popularity of smartphones and geolocation services, location-based social networks (Location-based Social Network,LBSN) have been proposed by social networking service providers and are popular with the public. LBSN can access users' geographic location information in real time. The virtual information propagated in the virtual network is effectively combined with the location information of the user in the real world. In order to solve the need of location recommendation in LBSN, academia and industry apply collaborative filtering algorithm to the location recommendation of LBSN. On the one hand, the location recommendation in LBSN can help ordinary users to filter out new sites of interest. On the other hand, it can help merchants to promote their own brand and marketing. However, due to the large scale, heterogeneity and multi-dimension of the data in current LBSN, the proposed collaborative filtering location recommendation algorithm for LBSN still has much room for improvement in real-time and recommendation accuracy. Specifically, considering the real-time location recommendation of time and location, this paper has completed the following work: (1) through the establishment of graph-based scoring data model, By combining the traditional collaborative filtering algorithm with the parallel graph computing framework and the improved K nearest neighbor (K-nearest Neighbors,KNN) algorithm, the GK-CF (Graph KNN Collaborative Filtering) algorithm is proposed. Through the message propagation of graph and the improved similarity calculation model, the users are filtered first and then the similarity is calculated. Based on the node structure of the user-item bipartite graph, the shortest path algorithm of the graph is used to locate the item to be graded quickly. (2) based on the GK-CF algorithm, the spatio-temporal information in LBSN is combined. Furthermore, a collaborative filtering location recommendation algorithm LGP-CF (Location Graph Place Collaborative Filtering). Based on spatio-temporal information in LBSN is proposed. According to the behavior of user check-in, the data set is partitioned to reduce the size of the data to be calculated. The similar user set is obtained by clustering algorithm, and the selection range of similar user set is reduced. Track data and point data are combined to calculate similarity. Finally, on the basis of the location clustering based on longitude and latitude information, fast and reliable location can be recommended. (3) the above algorithms are parallelized and optimized by using the GraphX parallel graph framework on the Spark platform. The scalability and real-time performance of the algorithm are improved by optimizing the algorithm flow and performance. In the real physical cluster environment, the experimental results show that, compared with other collaborative filtering algorithms, the rmse, accuracy, recall rate and other indicators, The algorithm presented in this paper shows good recommendation accuracy and accuracy of score prediction. The speedup ratio also shows that the proposed algorithm has good scalability and real-time performance.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.3
本文编号:2384472
[Abstract]:With the rapid development of the Internet, the recommendation system can alleviate the puzzlement of the users when they filter the content of interest, and help the users to find valuable information. It has become an effective means to solve the information overload. Collaborative filtering algorithms in recommendation systems are widely used because of their domain independence and the advantages of supporting users to discover potential interests. With the popularity of smartphones and geolocation services, location-based social networks (Location-based Social Network,LBSN) have been proposed by social networking service providers and are popular with the public. LBSN can access users' geographic location information in real time. The virtual information propagated in the virtual network is effectively combined with the location information of the user in the real world. In order to solve the need of location recommendation in LBSN, academia and industry apply collaborative filtering algorithm to the location recommendation of LBSN. On the one hand, the location recommendation in LBSN can help ordinary users to filter out new sites of interest. On the other hand, it can help merchants to promote their own brand and marketing. However, due to the large scale, heterogeneity and multi-dimension of the data in current LBSN, the proposed collaborative filtering location recommendation algorithm for LBSN still has much room for improvement in real-time and recommendation accuracy. Specifically, considering the real-time location recommendation of time and location, this paper has completed the following work: (1) through the establishment of graph-based scoring data model, By combining the traditional collaborative filtering algorithm with the parallel graph computing framework and the improved K nearest neighbor (K-nearest Neighbors,KNN) algorithm, the GK-CF (Graph KNN Collaborative Filtering) algorithm is proposed. Through the message propagation of graph and the improved similarity calculation model, the users are filtered first and then the similarity is calculated. Based on the node structure of the user-item bipartite graph, the shortest path algorithm of the graph is used to locate the item to be graded quickly. (2) based on the GK-CF algorithm, the spatio-temporal information in LBSN is combined. Furthermore, a collaborative filtering location recommendation algorithm LGP-CF (Location Graph Place Collaborative Filtering). Based on spatio-temporal information in LBSN is proposed. According to the behavior of user check-in, the data set is partitioned to reduce the size of the data to be calculated. The similar user set is obtained by clustering algorithm, and the selection range of similar user set is reduced. Track data and point data are combined to calculate similarity. Finally, on the basis of the location clustering based on longitude and latitude information, fast and reliable location can be recommended. (3) the above algorithms are parallelized and optimized by using the GraphX parallel graph framework on the Spark platform. The scalability and real-time performance of the algorithm are improved by optimizing the algorithm flow and performance. In the real physical cluster environment, the experimental results show that, compared with other collaborative filtering algorithms, the rmse, accuracy, recall rate and other indicators, The algorithm presented in this paper shows good recommendation accuracy and accuracy of score prediction. The speedup ratio also shows that the proposed algorithm has good scalability and real-time performance.
【学位授予单位】:北京交通大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TP391.3
【参考文献】
相关期刊论文 前7条
1 邹本友;李翠平;谭力文;陈红;王绍卿;;基于用户信任和张量分解的社会网络推荐[J];软件学报;2014年12期
2 刘树栋;孟祥武;;基于位置的社会化网络推荐系统[J];计算机学报;2015年02期
3 朱夏;宋爱波;东方;罗军舟;;云计算环境下基于协同过滤的个性化推荐机制[J];计算机研究与发展;2014年10期
4 谢娟英;高红超;;基于统计相关性与K-means的区分基因子集选择算法[J];软件学报;2014年09期
5 马远坤;梁永全;刘彤;赵建立;李玉军;;一种基于数据迁移的冷启动解决算法[J];计算机工程;2014年04期
6 孙光福;吴乐;刘淇;朱琛;陈恩红;;基于时序行为的协同过滤推荐算法[J];软件学报;2013年11期
7 贾冬艳;张付志;;基于双重邻居选取策略的协同过滤推荐算法[J];计算机研究与发展;2013年05期
相关硕士学位论文 前3条
1 吴庭;基于位置的推荐计算:Spark实现[D];浙江大学;2016年
2 王静金;基于位置社交网络的个性化地点推荐算法研究[D];厦门大学;2014年
3 张亮;基于聚类技术的推荐算法研究[D];电子科技大学;2012年
,本文编号:2384472
本文链接:https://www.wllwen.com/guanlilunwen/yingxiaoguanlilunwen/2384472.html