面向主题的元搜索引擎技术研究与系统实现
发布时间:2019-06-03 15:59
【摘要】:随着Internet上信息的爆炸式增长,使得用户面对Internet上浩如烟海的信息世界,往往无法快速准确地找到自己想要的信息,传统搜索引擎的出现在一定程度上解决了互联网信息检索的问题,但是当前主要的搜索引擎的查准率不高,并且不同的搜索引擎所采用的算法和搜索范围不同,导致搜索结果有很大差异,经中国搜索引擎用户行为研究报告和市场调查报告显示,谷歌、百度、雅虎等几大搜索引擎的搜索结果重复率不到34%,如果想获得比较全面而又准确的结果,就不得不在各个搜索引擎之间相互转换,反复调用多个搜索引擎,这给我们加快的生活节奏带来了不便。元搜索引擎应运而生,它的搜索结果相对传统的搜索引擎较全面,使得元搜索引擎得到快速的发展。 元搜索引擎是一种集成多个成员搜索引擎的网络检索工具,这使得搜索结果的覆盖范围较广,查全率较高。但是元搜索引擎同传统的搜索引擎一样没有考虑用户的个性化需求。个性化元搜索引擎结合元搜索引擎的查全率和个性化技术的查准率的优点,有效地改善了现有的搜索引擎的不足之处。 本文首先分析了传统搜索引擎的缺点和不足,对元搜索引擎和个性化技术进行了概述,通过对相关理论和技术的综述,总结了元搜索引擎技术和个性化技术的研究现状和发展趋势。然后深入研究了实现个性化搜索引擎的相关理论和技术,并进行了对比和分析。在上述理论和技术分析的基础上提出了个性化元搜索引擎的相关算法: 针对在HTML源代码中存在一定规律的主题的搜索,本文在元搜索引擎的基础上设计了针对电话号码查询的个性化元搜索引擎算法,它利用了元搜索引擎查全率高的特点,实现了提取电话号码和拥有电话号码的用户信息的功能,,而其它的无关信息全部被剔除。 针对无规律可循的主题搜索,本文提出了一种新型的主题搜索框架,这种框架基于主题词典进行主题词精简,得到有序的主题词汇表;另外在网页的相关度计算方法中考虑了网页的结构特征。 针对不同用户主机上的浏览历史记录,本文采用tasklist. exe对用户的搜索历史进行追踪,本方法同时对不同的浏览器进行追踪,对其追踪的结果进行一定的处理之后,通过人机交互的界面实现针对用户浏览历史记录查询的功能。 最后设计并实现了针对上述要求的原型系统。
[Abstract]:With the explosive growth of information on the Internet, users are often unable to find the information they want quickly and accurately in the face of the vast information world on the Internet. The emergence of traditional search engines has solved the problem of Internet information retrieval to a certain extent, but the precision of the main search engines is not high, and the algorithms and search ranges adopted by different search engines are different. As a result, search results are very different. according to the Chinese search engine user behavior study and market research report, the repetition rate of search results of several major search engines, such as Google, Baidu, Yahoo and so on, is less than 34%. If we want to obtain more comprehensive and accurate results, we have to convert each other between search engines and repeatedly call multiple search engines, which brings inconvenience to our accelerated pace of life. Meta search engine emerges as the times require, and its search results are more comprehensive than the traditional search engine, which makes the meta search engine develop rapidly. Meta search engine is a kind of network retrieval tool which integrates multiple member search engines, which makes the coverage of search results wide and the recall rate higher. However, meta-search engines, like traditional search engines, do not take into account the personalized needs of users. Personalized meta-search engine combines the advantages of recall rate of meta-search engine and precision rate of personalized technology, and effectively improves the shortcomings of the existing search engine. This paper first analyzes the shortcomings and shortcomings of the traditional search engine, summarizes the meta-search engine and personalized technology, and summarizes the related theories and technologies. The research status and development trend of meta-search engine technology and personalized technology are summarized. Then the related theory and technology of personalized search engine are deeply studied, and the comparison and analysis are carried out. On the basis of the above theoretical and technical analysis, this paper puts forward the related algorithms of personalized meta-search engine: for the search of topics with certain rules in HTML source code, In this paper, a personalized meta-search engine algorithm for telephone number query is designed on the basis of meta-search engine. It makes use of the high recall rate of meta-search engine and realizes the function of extracting telephone number and user information with telephone number. All other unrelated information was eliminated. Aiming at the irregular topic search, this paper proposes a new topic search framework, which is based on the topic dictionary to simplify the subject words and obtain an orderly topic vocabulary. In addition, the structural characteristics of the web page are considered in the correlation calculation method of the web page. According to the browsing history on different user hosts, tasklist. is used in this paper. Exe tracks the search history of users. At the same time, this method tracks different browsers. After processing the tracking results to a certain extent, the function of browsing history query for users is realized through the interface of human-computer interaction. Finally, a prototype system is designed and implemented to meet the above requirements.
【学位授予单位】:天津理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
本文编号:2492050
[Abstract]:With the explosive growth of information on the Internet, users are often unable to find the information they want quickly and accurately in the face of the vast information world on the Internet. The emergence of traditional search engines has solved the problem of Internet information retrieval to a certain extent, but the precision of the main search engines is not high, and the algorithms and search ranges adopted by different search engines are different. As a result, search results are very different. according to the Chinese search engine user behavior study and market research report, the repetition rate of search results of several major search engines, such as Google, Baidu, Yahoo and so on, is less than 34%. If we want to obtain more comprehensive and accurate results, we have to convert each other between search engines and repeatedly call multiple search engines, which brings inconvenience to our accelerated pace of life. Meta search engine emerges as the times require, and its search results are more comprehensive than the traditional search engine, which makes the meta search engine develop rapidly. Meta search engine is a kind of network retrieval tool which integrates multiple member search engines, which makes the coverage of search results wide and the recall rate higher. However, meta-search engines, like traditional search engines, do not take into account the personalized needs of users. Personalized meta-search engine combines the advantages of recall rate of meta-search engine and precision rate of personalized technology, and effectively improves the shortcomings of the existing search engine. This paper first analyzes the shortcomings and shortcomings of the traditional search engine, summarizes the meta-search engine and personalized technology, and summarizes the related theories and technologies. The research status and development trend of meta-search engine technology and personalized technology are summarized. Then the related theory and technology of personalized search engine are deeply studied, and the comparison and analysis are carried out. On the basis of the above theoretical and technical analysis, this paper puts forward the related algorithms of personalized meta-search engine: for the search of topics with certain rules in HTML source code, In this paper, a personalized meta-search engine algorithm for telephone number query is designed on the basis of meta-search engine. It makes use of the high recall rate of meta-search engine and realizes the function of extracting telephone number and user information with telephone number. All other unrelated information was eliminated. Aiming at the irregular topic search, this paper proposes a new topic search framework, which is based on the topic dictionary to simplify the subject words and obtain an orderly topic vocabulary. In addition, the structural characteristics of the web page are considered in the correlation calculation method of the web page. According to the browsing history on different user hosts, tasklist. is used in this paper. Exe tracks the search history of users. At the same time, this method tracks different browsers. After processing the tracking results to a certain extent, the function of browsing history query for users is realized through the interface of human-computer interaction. Finally, a prototype system is designed and implemented to meet the above requirements.
【学位授予单位】:天津理工大学
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:TP391.3
【参考文献】
相关期刊论文 前8条
1 张志强,邢春晓,周立柱,孙静,钱乾;SESQ系统的一种查询优化策略[J];计算机研究与发展;2004年10期
2 单松巍,冯是聪,李晓明;几种典型特征选取方法在中文网页分类上的效果比较[J];计算机工程与应用;2003年22期
3 庞剑锋,卜东波,白硕;基于向量空间模型的文本自动分类系统的研究与实现[J];计算机应用研究;2001年09期
4 徐莹;;搜索引擎技术及其发展前瞻[J];科技情报开发与经济;2005年24期
5 王自强,冯博琴;Web信息查询优化的遗传算法[J];控制与决策;2005年02期
6 李广建,黄];元搜索引擎及其主要技术[J];情报科学;2002年02期
7 原福永;梁顺攀;;元搜索引擎的现状与发展[J];计算机工程与设计;2005年12期
8 王美霞;李玉坤;肖迎元;;一种新型垂直搜索引擎构建方法[J];天津理工大学学报;2012年Z1期
相关硕士学位论文 前4条
1 王春艳;元搜索引擎的研究与实现[D];吉林大学;2011年
2 李盛韬;基于主题的Web信息采集技术研究[D];中国科学院研究生院(计算技术研究所);2002年
3 张园园;基于用户兴趣的个性化搜索引擎的分析与研究[D];燕山大学;2006年
4 胡升泽;个性化元搜索引擎若干关键技术研究[D];国防科学技术大学;2008年
本文编号:2492050
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2492050.html