智慧旅游中信息检索算法的研究和应用

发布时间：2018-06-07 01:11

本文选题：搜索引擎 + 兴趣模型　；参考：《浙江理工大学》2017年硕士论文

【摘要】：随着生活水平的逐渐提高,旅游已成为绝大多数人的休闲活动之一,且在当今信息技术快速普及的趋势下,用户在制定旅游计划时,一般会优先通过检索平台去查询相关的旅游信息。但互联网中存储的旅游信息量日渐庞大且愈来愈错综复杂,用户对检索平台所提供的旅游信息的相关性也就越来越关注,用户通过检索平台输入检索项后,总是希望与检索项最相关且最可靠的旅游信息呈现于搜索结果的最顶端,如何将最相关且最可靠的信息源作为搜索结果呈现给用户,让用户真正享受智慧旅游,是检索平台迫切要解决的问题之一。因此,检索排序算法成为当前搜索引擎重点研究的方向之一。本文针对智慧旅游中的信息检索算法进行了以下研究:(1)分析传统Page Rank算法原理。分析传统PageRank算法存在的不足,以及参考前人对其不足所进行的改进,提出了一种基于链接页面相似度的SM-PageRank算法,该算法将页面和其链接网页间的相似度引入到PageRank算法的计算中去,且通过这种计算能够合理地对链接页面的权值进行分配。(2)基于用户兴趣模型对排序结果进行二次排序。基本原理是:首先为每个用户建立用户兴趣模型,当用户进行搜索时,检索引擎返回第一次排序的结果集,并将结果集中的每个页面和用户兴趣模型进行相似度的计算,然后使用计算好的相似度对每个页面的得分值进行重新计算,最后根据新的得分值进行降序排序,并将最终的排序结果展现给用户。因为二次排序的基础是用户兴趣模型,所以需要对用户兴趣的获取、用户兴趣模型的建立和用户兴趣模型的更新进行更深层次地分析,以便更好地通过用户兴趣模型对第一次搜索结果集进行二次排序。(3)使用Nutch和Solr来搭建智慧旅游检索实验平台。首先通过Nutch对实验数据源进行抓取,然后将SM-Page Rank算法和传统PageRank算法分别应用到Nutch中。在Solr中使用IKAnalyzer工具进行中文分词,最后调用Solr所提供的应用服务进行搜索查询。实验结果证明,与传统Page Rank算法相比,优化后的SM-PageRank算法的排序结果准确率更高,且二次排序的应用也使得搜索结果的准确率进一步提升,使最终的排序结果更加符合用户的需求。
[Abstract]:With the gradual improvement of living standards, tourism has become one of the leisure activities of the vast majority of people, and under the trend of rapid popularization of information technology, when users make travel plans, In general, priority will be given to query related travel information through the search platform. However, the amount of tourism information stored in the Internet is becoming larger and more complex, and users pay more and more attention to the relevance of tourism information provided by the retrieval platform. It is always hoped that the most relevant and reliable tourist information will be presented at the top of the search results. How to use the most relevant and reliable information sources as the search results to make the users really enjoy the intelligent travel, It is one of the urgent problems to be solved by the retrieval platform. Therefore, search sorting algorithm has become one of the key research directions of search engine. In this paper, the information retrieval algorithm in intelligent tourism is studied as follows: 1) the principle of traditional Page Rank algorithm is analyzed. This paper analyzes the shortcomings of the traditional PageRank algorithm and proposes a SM-PageRank algorithm based on the similarity of linked pages. The algorithm introduces the similarity between the pages and its linked web pages into the calculation of the PageRank algorithm, and reasonably allocates the weights of the linked pages. The basic principle is: firstly, the user interest model is established for each user. When the user searches, the search engine returns the first sorted result set, and calculates the similarity between each page in the result set and the user interest model. Then the calculated similarity is used to recalculate the score value of each page. Finally, according to the new score value, the descending order is arranged, and the final sorting result is presented to the user. Because the second order is based on the user interest model, it is necessary to further analyze the acquisition of user interest, the establishment of user interest model and the updating of user interest model. In order to better use the user interest model to sort the first search result set twice. (3) using Nutch and Solr to build the intelligent travel retrieval experimental platform. First, the experimental data source is fetched by Nutch, and then the SM-Page Rank algorithm and the traditional PageRank algorithm are applied to Nutch. In Solr, IKAnalyzer is used to segment Chinese words, and finally, the application service provided by Solr is called to search and query. The experimental results show that, compared with the traditional Page Rank algorithm, the optimized SM-PageRank algorithm has a higher accuracy rate, and the application of the secondary sorting algorithm can further improve the accuracy of the search results. Make the final sorting results more in line with the needs of the user.
【学位授予单位】：浙江理工大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【参考文献】