基于用户兴趣模型的个性化搜索排序研究

发布时间：2018-03-06 20:46

本文选题：用户兴趣模型　切入点：个性化因子　出处：《浙江理工大学》2015年硕士论文　论文类型：学位论文

【摘要】：随着信息时代的到来，互联网上数据规模呈指数增长。一方面搜索引擎的数据抓取覆盖率远不及信息增长的速度，另一方面网民的数量和质量都在提高，这对搜索引擎提出了更高的要求。搜索引擎如何提供更好的用户体验，更精确的个性需求排序结果，是现代个性化搜索引擎的研究热点和发展方向。本课题从搜索引擎整体的架构原理开始分析，提出个性化因子概念，对用户兴趣模型的构建和更新进行分析，最终实现基于用户兴趣模型的个性化搜索引擎原型系统。主要工作体现在以下几个方面： 1.分析总结目前个性化搜索引擎构建方案。包括基于查询改进、设置页面权重、元搜索引擎合并和网络爬虫采集个性化方案，进而确定本课题使用查询改进与页面权重相结合方式来构建个性化搜索引擎。 2.用户兴趣模型构建。根据兴趣页面概念提出兴趣页面判定公式，独创性提出兴趣模型与用户兴趣模型解耦合方式。利用ODP生成兴趣模型，形成具有兴趣等级的树状结构模型，用户兴趣模型则是用关键词及权重构成向量，通过两者之间的映射关系在实际应用中进行转换处理。重点研究用户兴趣模型构建方案，，从兴趣页面提取页面特征词，利用判定公式得到用户兴趣特征词，根据兴趣特征词出现的位置重新计算兴趣特征词的权重值。用户兴趣模型更新策略体现在权值的变化上，对长期兴趣和短期兴趣以及兴趣词所在层级关系分别使用不同的遗忘因子对权值进行更新。 3.在Lucene公式中引入个性化因子。对Lucene评分算法机制进行分析，利用其开源和良好的扩展性，将用户兴趣模型的权重加到排序算法中，使得排序结果体现用户兴趣偏好。 4.实现个性化搜索引擎原型系统，并对结果进行比较分析。利用Nutch和封装了Lucene功能的Solr开源框架搭建个性化搜索引擎，在程序代码中调用Solr应用服务。考虑到Solr自带分词器对中文不支持，使用了第三方IKAnalyzer插件进行分词。最后选取了几组关键词进行查询并对结果进行比较分析，证明本课题所使用的个性化因子在应用中的可行性。
[Abstract]:With the advent of the information age, the scale of data on the Internet has increased exponentially. On the one hand, the data capture coverage of search engines is far from the speed of information growth, and on the other hand, the quantity and quality of Internet users are improving. How search engines provide better user experience and more accurate ranking results of personality requirements is the research focus and development direction of modern personalized search engines. This topic begins with the analysis of the whole structure principle of search engine, puts forward the concept of personalization factor, and analyzes the construction and updating of user interest model. Finally, the prototype system of personalized search engine based on user interest model is implemented. The main work is as follows:. 1. Analyze and summarize the current personalized search engine construction scheme, including query improvement, page weight setting, meta-search engine merging and web crawler acquisition personalized scheme, Furthermore, this paper uses query improvement and page weight to construct personalized search engine. 2. Constructing user interest model. According to the concept of interest page, this paper puts forward an interest page judging formula, and originality puts forward the decoupling method between interest model and user interest model. The interest model is generated by ODP, and a tree structure model with interest level is formed. On the other hand, the user interest model is composed of keywords and weights, and the mapping relationship between them is transformed in practical application. The construction scheme of user interest model is studied, and the page feature words are extracted from interest pages. According to the location of interest feature words, the weight of interest feature words is re-calculated by using the decision formula. The updating strategy of user interest model is reflected in the change of weights. Different forgetting factors are used to update the weights of long-term interest and short-term interest, as well as the hierarchy of interest words. 3. The individuation factor is introduced into the Lucene formula, the mechanism of Lucene scoring algorithm is analyzed, and the weight of user interest model is added to the sorting algorithm by using its open source and good expansibility, which makes the sorting result reflect the preference of user interest. 4. The prototype system of personalized search engine is implemented, and the results are compared and analyzed. The personalized search engine is built by using Nutch and the open source framework of Solr, which encapsulates the function of Lucene. Solr application service is called in the program code. Considering that the Solr native word particifier does not support Chinese, the third party IKAnalyzer plug-in is used for word segmentation. Finally, several groups of keywords are selected for query and the results are compared and analyzed. It is proved that the individuation factor used in this paper is feasible in application.
【学位授予单位】：浙江理工大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TP391.3

【参考文献】