基于数据挖掘的社区网站用户行为分析系统
发布时间:2018-02-10 18:34
本文关键词: 数据挖掘 行为分析 高维数据索引 出处:《南京邮电大学》2012年硕士论文 论文类型:学位论文
【摘要】:随着信息化逐步改善生活,衍生出如人人网、开心网、腾讯朋友网等的各类社区网络服务,它们向人们提供整合知识、咨询疑难、新闻关注、互通友谊等崭新功能,而通过分析用户行为来提供适合不同用户的特定服务将能够极大的增强用户体验。本文的目的正是为社区网站构建一套个性化智能推荐引擎,通过分析社区网站用户的特征,挖潜其用户的兴趣关注点,充分加强社区网站的用户体验,并为处于门户网站阶段、搜索引擎阶段的网站改造为智能推荐阶段提供一个原型参考。 通过参考数据挖掘及行为分析的国内外相关文献,本文先设计了基于数据挖掘的用户行为分析系统的总体架构及其主要业务流程,而后参照数据挖掘系统的构建基本步骤,本文从特征收集、特征预处理、相关性特征数据挖掘算法、特征数据高效索引等几个流程对基于社区网站用户行为分析系统进行详细设计,同时对该系统的时间调度机制进行了阐述。 为解决海量用户的高效行为分析,本文借鉴已有的研究成果,通过改进的正则表达式多模匹配算法实现高性能数据预处理模块,并通过建模将用户行为分析转换为排名问题进而采用Ranking算法进行数据挖掘,最后本系统将挖掘出的数据特征映射到高维空间,采用LSH算法构建模糊搜索来进行高性能的匹配与邻近查询。 经过实验仿真,多样化的分词引擎配合较为全面的词库不仅可以将用户的输入文本进行快速分词,同时具有较高的准确性;而正则表达式多模匹配算法经优化后可一定程度上降低内存消耗,实现可用的高效用户关注点匹配引擎;经过不同维度及不同数据规模的测试,改进的LSH算法可以满足海量用户兴趣特征的存储索引,不仅能在特征数量维度增加的时候保持建库及查询时间的线性增长,同时不会由于用户量的增加而明显增加检索匹配时间。因此本系统可基本满足社区网站的行为分析需求,为社区网站的用户行为分析提供了一套可行方案。
[Abstract]:With the gradual improvement of life by informationization, various kinds of community network services, such as Renren, Kaixin, Tencent Friends, etc., which provide people with new functions such as integrating knowledge, consulting and difficult problems, news attention, mutual friendship, etc., have spawned various kinds of community network services such as Renren, Kaixin, Tencent Friends, etc. The purpose of this paper is to build a personalized intelligent recommendation engine for community websites and analyze the characteristics of community website users by analyzing the behavior of users to provide specific services for different users. It can fully enhance the user experience of community websites and provide a prototype reference for the transformation of websites in portal stage and search engine stage for intelligent recommendation stage. By referring to the domestic and foreign literature on data mining and behavior analysis, this paper first designs the overall framework and main business process of user behavior analysis system based on data mining, and then refers to the basic steps of constructing data mining system. In this paper, the user behavior analysis system based on community website is designed in detail from several processes, such as feature collection, feature preprocessing, correlation feature data mining algorithm, feature data efficient index and so on. At the same time, the time scheduling mechanism of the system is expounded. In order to solve the problem of high-efficient behavior analysis of massive users, this paper uses the existing research results for reference, and implements the high-performance data preprocessing module through the improved regular expression multi-mode matching algorithm. Through modeling, the user behavior analysis is transformed into rank problem, and then Ranking algorithm is used for data mining. Finally, the system maps the extracted data features to high dimensional space. LSH algorithm is used to construct fuzzy search for high performance matching and neighbor query. Through the experiment simulation, the diversified word segmentation engine combined with a more comprehensive vocabulary can not only quickly segment the user's input text, but also have a high accuracy. After optimization, the regular expression multi-mode matching algorithm can reduce memory consumption to some extent and realize the efficient user concern matching engine, which is tested by different dimensions and different data scales. The improved LSH algorithm can satisfy the storage index of massive user's interest feature, and can not only keep the linear growth of database and query time when the dimension of feature number increases. At the same time, the search matching time will not be significantly increased because of the increase of the number of users. Therefore, the system can basically meet the needs of community website behavior analysis, and provide a set of feasible scheme for community website user behavior analysis.
【学位授予单位】:南京邮电大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP311.13;TP393.092
【引证文献】
相关硕士学位论文 前1条
1 徐雄威;基于本体的上下文感知“科技论文在线”用户行为推理研究[D];武汉理工大学;2013年
,本文编号:1501166
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1501166.html