基于时间效应的推荐算法研究

发布时间：2018-12-10 19:20

【摘要】：随着互联网的发展，信息过载问题越来越严重，用户找到自己想要的商品或信息所花费的时间越来越多。将来人们获取自己感兴趣信息的途径有可能由单一的搜索引擎变为搜索引擎与推荐系统相结合。推荐系统的价值在于不仅能够推荐给用户符合用户兴趣的物品，而且还要能够发现长尾商品，长尾商品更能体现小群体用户的个性化需求。现在推荐算法的研究越来越受到人们的重视，特别是推荐算法的一些比赛如Netflix、2012年KDD track1等比赛的出现，更加促进了推荐算法的快速发展。在推荐算法中时间信息作为一种上下文信息能够提高推荐的质量。一方面推荐系统要能准确的给用户推荐与其兴趣相关的产品，而且要能在正确的时间给用户做推荐；另一方面用户在不同时间对相同的推荐结果做出的反馈不同。因此，时间信息受到越来越多研究者的关注，现在也有很多考虑时间因素的推荐算法被提出来。有些在模型中直接加入时间特征，有些模型不考虑时间特征，但以时间特征去选择用来建模的数据集。本文针对目前推荐算法中引入时间因素的方法做出改进。时间因素的引入主要体现在模拟用户兴趣度随时间的变化、物品流行度随时间的变化和社会群体兴趣度随时间的变化。社会群体兴趣度随时间变化容易模拟，难点在于用户兴趣度随时间的变化以及物品流行度随时间变化，因为不同的用户有不同的兴趣度变化趋势，，不同的物品也有不同的流行度变化趋势。当前的很多引入时间因素的推荐算法，没有考虑这些不同，只是对所有的用户采用相同的兴趣度变化模型，对所有的物品采用相同的流行度变化模型。针对这个问题本文提出了对每个用户的兴趣度变化趋势分别建模以及对每个物品流行度变化趋势分别建模的方法。因为用户的当前行为受用户近期行为的影响，所以本文通过为用户近期行为赋予不同的权重来对当前时刻用户的兴趣进行模拟，也就是通过用户近期行为对当前的兴趣贡献程度的不同来间接模拟出不同用户的不同兴趣度变化趋势。通过对物品近期流行度赋予不同的权重来模拟当前物品的流行度。这些权重的求解方法是以每个用户以及每个电影评分的时间序列数据作为训练集，首先把数据集按时间分隔，然后以时间片为单位求得各时间片对应的评分均值，最后通过随机梯度下降算法求解模型中各参数。
[Abstract]:With the development of the Internet, the problem of information overload becomes more and more serious, and it takes more and more time for users to find the goods or information they want. In the future, it is possible for people to get information of their own interest from a single search engine to a search engine and a recommendation system. The value of the recommendation system is not only to recommend to the user to meet the interests of the user, but also to find long-tailed goods, which can more reflect the personalized needs of small groups of users. Nowadays, people pay more and more attention to the research of recommendation algorithm, especially the appearance of some competitions such as Netflix, 2012 KDD track1, which promotes the fast development of recommendation algorithm. As a kind of context information, time information can improve the quality of recommendation in recommendation algorithm. On the one hand, the recommendation system should be able to recommend the products related to their interest to users accurately, and make recommendations to users at the right time; on the other hand, the feedback of users on the same recommendation results at different times is different. Therefore, more and more researchers pay attention to time information. Some models directly add time features, some models do not consider time features, but use time features to select the data set used for modeling. This paper improves the method of introducing time factor into the recommendation algorithm. The introduction of time factor is mainly reflected in the change of interest degree of simulated user with time, the change of item popularity with time and the change of interest degree of social group with time. It is easy to simulate the change of social group interest with time. The difficulty lies in the change of user interest with time and the change of article popularity with time, because different users have different trends of interest. Different items also have different trends of popularity. Many current recommendation algorithms that introduce time factor do not take these differences into account, but use the same interest change model for all users and the same popularity change model for all items. In order to solve this problem, this paper presents a method to model the trend of interest change for each user and to model the trend of change in popularity of each item separately. Because the current behavior of the user is influenced by the user's recent behavior, this paper simulates the interest of the user at the current time by assigning different weights to the user's recent behavior. In other words, the change trend of different users' interest degree is indirectly simulated by the difference of the user's recent behavior to the current interest contribution degree. By giving different weights to the near-term popularity of articles, this paper simulates the popularity of current articles. The method of calculating these weights is to take the time series data of each user and each movie score as the training set. Firstly, the data sets are separated by time, and then the mean value of each time slice is obtained by using time slice as the unit. Finally, the parameters of the model are solved by the stochastic gradient descent algorithm.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP391.3

【引证文献】