基于爬虫的小企业搜索系统的设计与实现
发布时间:2019-01-10 11:14
【摘要】:随着互联网的不断发展,网络已逐步成为众人获取信息的一种主要手段,而且目前一些门户网站的信息内容正以惊人的速度增加着。面对如此大量的信息,如何更全面、更准确地获取最新、最有效的信息已经成为我们把握机遇迎接挑战的首要条件。针对一些门户网站,目前传统的WEB搜索引擎具有更新慢,准确度低等几个缺点,为了提高针对一些网站的信息检索能力,本课题研究设计了一套基于爬虫技术的站内搜索引擎系统。 本论文首先简要介绍和分析基于爬虫技术的搜索引擎系统的研究背景、国内外现有的搜索技术。接着,进行需求分析,提出系统应具备实时性和高准确性等特点,对此提出系统的总体设计框架、模块划分,以及模块相关介绍。 本系统采用Maven进行项目管理,选用Velocity模板技术实现网络机器人,基于Compass和中文分词技术,采用Service模式设计搜索框架以及采用Webwork、 Spring等J2EE技术,采用MVC模式、Command模式等以及多种RPC技术等实现多种搜索接口。该系统可为企业提供通用的垂直搜索服务,具备实时性、通用性等特点,并可通过多种搜索接口可以很方便的与企业应用集成。
[Abstract]:With the continuous development of the Internet, the Internet has gradually become a major means for people to obtain information, and at present, the information content of some web portals is increasing at an alarming rate. In the face of so much information, how to obtain the latest and most effective information more comprehensively and accurately has become the most important condition for us to seize the opportunity to meet the challenge. The traditional WEB search engine has several shortcomings such as slow update, low accuracy and so on. In order to improve the ability of information retrieval for some websites, In this paper, a search engine system based on crawler technology is designed. Firstly, this paper briefly introduces and analyzes the research background of search engine system based on crawler technology, and the existing search technology at home and abroad. Then, the requirement analysis is carried out, and the characteristics of real-time and high accuracy are put forward. The overall design framework, module partition and module related introduction of the system are put forward. In this system, Maven is used for project management, Velocity template technology is used to realize network robot, based on Compass and Chinese word segmentation technology, Service pattern is used to design search framework, J2EE technology such as Webwork, Spring is adopted, and MVC mode is adopted. Command mode and various RPC technologies to achieve a variety of search interfaces. The system can provide a general vertical search service for enterprises, with the characteristics of real-time, versatility, and can be easily integrated with enterprise applications through a variety of search interfaces.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3
本文编号:2406239
[Abstract]:With the continuous development of the Internet, the Internet has gradually become a major means for people to obtain information, and at present, the information content of some web portals is increasing at an alarming rate. In the face of so much information, how to obtain the latest and most effective information more comprehensively and accurately has become the most important condition for us to seize the opportunity to meet the challenge. The traditional WEB search engine has several shortcomings such as slow update, low accuracy and so on. In order to improve the ability of information retrieval for some websites, In this paper, a search engine system based on crawler technology is designed. Firstly, this paper briefly introduces and analyzes the research background of search engine system based on crawler technology, and the existing search technology at home and abroad. Then, the requirement analysis is carried out, and the characteristics of real-time and high accuracy are put forward. The overall design framework, module partition and module related introduction of the system are put forward. In this system, Maven is used for project management, Velocity template technology is used to realize network robot, based on Compass and Chinese word segmentation technology, Service pattern is used to design search framework, J2EE technology such as Webwork, Spring is adopted, and MVC mode is adopted. Command mode and various RPC technologies to achieve a variety of search interfaces. The system can provide a general vertical search service for enterprises, with the characteristics of real-time, versatility, and can be easily integrated with enterprise applications through a variety of search interfaces.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3
【参考文献】
相关期刊论文 前10条
1 杨树林;胡洁萍;;JSON数据交换格式及其在数据验证中的应用[J];北京印刷学院学报;2008年04期
2 张盼;聂刚;;基于Lucene的全文检索系统的设计与实现[J];电脑知识与技术;2010年01期
3 徐立新;雷相波;;应用Maven管理项目[J];电脑知识与技术;2010年10期
4 徐宏发;王卫平;;基于SOA的数字图书馆互操作开放框架[J];计算机工程与应用;2006年34期
5 陆荣幸,郁洲,阮永良,王志强;J2EE平台上MVC设计模式的研究与实现[J];计算机应用研究;2003年03期
6 俞华锋;;Memcached在大型网站中的应用[J];科技信息(科学教研);2008年01期
7 李蕾,王楠,钟义信,郭祥昊,韩鹏,贾自燕,高清霞;基于语义网络的概念检索研究与实现[J];情报学报;2000年05期
8 姜强;;SOA的规划与设计[J];软件导刊;2010年11期
9 赵国栋;;SOA观点 SOA,,重在实践[J];信息系统工程;2006年08期
10 刘纯波,李琦,承继成;基于XML-RPC的分布式地理信息系统计算模型[J];中国图象图形学报;2003年06期
本文编号:2406239
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2406239.html