基于用户兴趣模型的个性化搜索排序研究
发布时间:2018-03-06 20:46
本文选题:用户兴趣模型 切入点:个性化因子 出处:《浙江理工大学》2015年硕士论文 论文类型:学位论文
【摘要】:随着信息时代的到来,互联网上数据规模呈指数增长。一方面搜索引擎的数据抓取覆盖率远不及信息增长的速度,另一方面网民的数量和质量都在提高,这对搜索引擎提出了更高的要求。搜索引擎如何提供更好的用户体验,更精确的个性需求排序结果,是现代个性化搜索引擎的研究热点和发展方向。 本课题从搜索引擎整体的架构原理开始分析,提出个性化因子概念,对用户兴趣模型的构建和更新进行分析,最终实现基于用户兴趣模型的个性化搜索引擎原型系统。主要工作体现在以下几个方面: 1.分析总结目前个性化搜索引擎构建方案。包括基于查询改进、设置页面权重、元搜索引擎合并和网络爬虫采集个性化方案,进而确定本课题使用查询改进与页面权重相结合方式来构建个性化搜索引擎。 2.用户兴趣模型构建。根据兴趣页面概念提出兴趣页面判定公式,独创性提出兴趣模型与用户兴趣模型解耦合方式。利用ODP生成兴趣模型,形成具有兴趣等级的树状结构模型,用户兴趣模型则是用关键词及权重构成向量,通过两者之间的映射关系在实际应用中进行转换处理。重点研究用户兴趣模型构建方案,,从兴趣页面提取页面特征词,利用判定公式得到用户兴趣特征词,根据兴趣特征词出现的位置重新计算兴趣特征词的权重值。用户兴趣模型更新策略体现在权值的变化上,对长期兴趣和短期兴趣以及兴趣词所在层级关系分别使用不同的遗忘因子对权值进行更新。 3.在Lucene公式中引入个性化因子。对Lucene评分算法机制进行分析,利用其开源和良好的扩展性,将用户兴趣模型的权重加到排序算法中,使得排序结果体现用户兴趣偏好。 4.实现个性化搜索引擎原型系统,并对结果进行比较分析。利用Nutch和封装了Lucene功能的Solr开源框架搭建个性化搜索引擎,在程序代码中调用Solr应用服务。考虑到Solr自带分词器对中文不支持,使用了第三方IKAnalyzer插件进行分词。最后选取了几组关键词进行查询并对结果进行比较分析,证明本课题所使用的个性化因子在应用中的可行性。
[Abstract]:With the advent of the information age, the scale of data on the Internet has increased exponentially. On the one hand, the data capture coverage of search engines is far from the speed of information growth, and on the other hand, the quantity and quality of Internet users are improving. How search engines provide better user experience and more accurate ranking results of personality requirements is the research focus and development direction of modern personalized search engines. This topic begins with the analysis of the whole structure principle of search engine, puts forward the concept of personalization factor, and analyzes the construction and updating of user interest model. Finally, the prototype system of personalized search engine based on user interest model is implemented. The main work is as follows:. 1. Analyze and summarize the current personalized search engine construction scheme, including query improvement, page weight setting, meta-search engine merging and web crawler acquisition personalized scheme, Furthermore, this paper uses query improvement and page weight to construct personalized search engine. 2. Constructing user interest model. According to the concept of interest page, this paper puts forward an interest page judging formula, and originality puts forward the decoupling method between interest model and user interest model. The interest model is generated by ODP, and a tree structure model with interest level is formed. On the other hand, the user interest model is composed of keywords and weights, and the mapping relationship between them is transformed in practical application. The construction scheme of user interest model is studied, and the page feature words are extracted from interest pages. According to the location of interest feature words, the weight of interest feature words is re-calculated by using the decision formula. The updating strategy of user interest model is reflected in the change of weights. Different forgetting factors are used to update the weights of long-term interest and short-term interest, as well as the hierarchy of interest words. 3. The individuation factor is introduced into the Lucene formula, the mechanism of Lucene scoring algorithm is analyzed, and the weight of user interest model is added to the sorting algorithm by using its open source and good expansibility, which makes the sorting result reflect the preference of user interest. 4. The prototype system of personalized search engine is implemented, and the results are compared and analyzed. The personalized search engine is built by using Nutch and the open source framework of Solr, which encapsulates the function of Lucene. Solr application service is called in the program code. Considering that the Solr native word particifier does not support Chinese, the third party IKAnalyzer plug-in is used for word segmentation. Finally, several groups of keywords are selected for query and the results are compared and analyzed. It is proved that the individuation factor used in this paper is feasible in application.
【学位授予单位】:浙江理工大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 张丹;;中文分词算法综述[J];黑龙江科技信息;2012年08期
2 陈一峰;赵恒凯;余小清;万旺根;;基于本体的用户兴趣模型构建研究[J];计算机工程;2010年21期
3 邵秀丽;乜聚科;侯乐彩;田振雷;;基于综合用户信息的用户兴趣建模研究[J];南开大学学报(自然科学版);2009年03期
4 李伟;;基于Nutch和Hadoop的分布式搜索引擎探究[J];信息通信;2012年05期
5 李超;谢坤武;;用户搜索体验质量及搜索结果排序[J];计算机工程与应用;2014年01期
6 徐树振;罗学礼;王森;杨莉;段嘉杰;张德刚;;企业非结构化数据检索研究[J];信息技术;2014年04期
7 王玮璇;;基于Lucene的自定义检索模型在内容管理系统全文检索中的应用[J];机电产品开发与创新;2014年02期
8 牛凯;;Web数据挖掘在校园网搜索引擎系统中的应用研究[J];中国信息化;2014年11期
9 李树青;崔北亮;;基于个性化信息推荐服务的Web搜索引擎技术综述[J];情报杂志;2007年08期
10 胡吉明;;个性化搜索引擎中的用户兴趣提取技术[J];图书馆学刊;2006年04期
本文编号:1576455
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1576455.html