用户画像在内容推送的研究与应用

发布时间：2018-04-13 02:00

本文选题：移动数据 + 用户画像　；参考：《北方工业大学》2017年硕士论文

【摘要】：在大数据时代,移动用户如何从海量信息中寻找自己感兴趣的内容服务,以及内容服务提供者如何定位用户群体,更好的为用户提供服务,显得尤为困难。为解决此问题,本课题要实现一个能够自动为用户提供个性化内容推送服务的子系统。课题首先通过用户安装的App软件,来采集移动用户产生的数据即移动数据,并根据数据是否动态变化,将其分为静态数据和动态数据,静态数据即为用户的基本信息;动态数据即移动用户的行为数据:兴趣爱好数据,移动应用App数据,位置数据、使用智能终端数据等,并根据不同的数据类型,以树形结构构建不同的标签库。后台系统再通过编辑不同的内容,然后将内容组织成为有意义的内容服务,并映射到相应的标签下,从而形成内容库。在标签体系和内容库的基础上,以用户为中心,根据人的日常活动规律,将一天分成八个不同的时间段,如上班时间、午餐时间、休息时间等,然后统计各个时间段用户的兴趣标签数,并针对不同的数据类型采用不同的计算权重方法。兴趣爱好数据,采用自定义公式计算;移动应用App数据,采用改进的TF-IDF(term frequency-inverse document frequency)算法计算;位置数据、使用智能终端数据采用统计学方法计算;将计算之后的值作为权重,值越大,说明用户对该标签的喜好程度越大,然后经过排序,选取Top-N的标签,作为用户个体画像。在用户画像结果的基础上,通过分类算法,预测不同性别、不同年龄的用户在不同时间情景下的兴趣爱好。课题研究了传统的KNN(K-Nearest)、以及SVM(Support Vector Machine)、BP(Backpropagation)神经网络、DNN(Deep Natural Network)几种算法的使用,并在Iris数据集和课题数据集进行了实验,通过比较算法的准确性和耗时,最终选取了 DNN作为课题的预测算法。最后结合用户当前所处位置情景和时间情景,通过相应的推送算法,采用位置情景优先,时间情景次之的策略,利用用户画像和预测的兴趣爱好标签,选取内容库中的内容服务,自动推送给用户。然后通过实验证明,基于DNN的个性化推送子系统,能够依据用户的位置变化和时间情景变化,提供个性化的内容推送服务,并与传统的推送服务相比,具有较好的系统性能。
[Abstract]:In big data era, it is very difficult for mobile users to find the content service which they are interested in from the mass information, and how to locate the user group to provide better service to the user.In order to solve this problem, we need to implement a subsystem that can automatically provide personalized content push service for users.Firstly, the App software installed by the user is used to collect the mobile data generated by the mobile user, and according to whether the data changes dynamically, it is divided into static data and dynamic data, which is the basic information of the user.Dynamic data is the behavior data of mobile users, such as interest data, mobile application App data, location data, intelligent terminal data, etc. According to different data types, different tag libraries are constructed with tree structure.By editing different content, the background system organizes the content into a meaningful content service and maps it to the corresponding label to form a content library.On the basis of tag system and content library, taking the user as the center, according to the rules of people's daily activities, the day is divided into eight different time periods, such as work time, lunch time, rest time, etc.Then, the number of interest tags of users in each time period is counted, and different weight calculation methods are adopted for different data types.Interest data is calculated by custom formula; mobile application App data is calculated using improved TF-IDF(term frequency-inverse document frequency algorithm; position data is calculated by statistical method using intelligent terminal data; the calculated value is used as weight.The larger the value, the greater the user's preference for the tag, and then the Top-N label is selected as the user's individual portrait.Based on the results of the user portrait, the authors predict the interests and interests of users of different gender and age in different time situations through the classification algorithm.In this paper, we study the use of traditional KNNN K-Nearesti and SVM(Support Vector BackPropagation (BP) neural network. Experiments are carried out on Iris data sets and subject data sets, and the accuracy and time consuming of the algorithms are compared.Finally, DNN is chosen as the prediction algorithm.Finally, combined with the current location and time situation of the user, through the corresponding push algorithm, the strategy of location first, time scenario second, user portrait and predicted interest label are used.Select the content service in the content library and push it to the user automatically.Then it is proved by experiments that the personalized push subsystem based on DNN can provide personalized content push service according to the change of user's location and time and scene, and it has better system performance than traditional push service.
【学位授予单位】：北方工业大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【参考文献】