基于搜索框/资源池的云计算模型研究
发布时间:2018-04-02 00:06
本文选题:搜索框/资源池 切入点:分布式搜索器 出处:《南京航空航天大学》2012年硕士论文
【摘要】:本文以电子商务平台供应链云预研项目的研究开发为应用背景,对基于搜索框/资源池的云计算模型进行了研究,在分析当前集中式搜索引擎系统的基础上,总结了现有系统的优缺点,针对现有系统的缺陷和不足,建立基于搜索框/资源池的云计算框架,采用面向对象的方法论为指导,重点侧重于云搜索引擎的分析设计与实现。主要研究工作在于对传统搜索引擎的功能模块加以改进,对搜索器、索引器、查询器各部分进行了详细分析,采用分布式云计算技术进行重新设计。同时,应用Map/Reduce编程模型思想,把数据计算任务封装到Map函数中,把数据合并任务封装到Reduce函数中。经过以上改进的搜索引擎系统可以部署在廉价机器构成的Hadoop分布式环境中,并显著提高搜索引擎系统的查准率、查全率和响应速度。 首先,文章给出基于搜索框/资源池的云计算框架,对云计算定义及特点、搜索框、搜索引擎、资源池各部分做了初步的探讨。之后探讨了云计算关键技术和典型的云计算平台、并行工作流和并行编程模式以及资源池和分布式文件系统,分析了分布式文件系统以及Map/Reduce编程模型在数据处理中的具体应用,同时讨论了搜索引擎相关的倒排文档索引机制,中文分词原理以及相关的工具软件的使用范围,对传统搜索引擎并行化改进也进行了初步讨论。 接着,依据搜索引擎各子系统业务功能,阐述了分布式搜索引擎系统的设计思想。并对搜索引擎的核心三大部件:搜索器、索引器和查询器进行了功能设计,并给出了各个部件的工作执行流程,,并用BPEL业务流程描述语言给出了形式化描述,利用Map/Reduce模型改进这三个功能部件,使它们具有分布式处理的能力。 然后,根据功能设计和分布式搜索器、索引器、查询器部件的内部执行过程,结合BPEL业务流程描述,给出类关系实现图,对单机运行模式的nutch搜索引擎部件进行云分布式改进,进行系统的代码编程实现,并探讨了实现过程中的难点,及其解决方法。 最后,进行开发环境搭建,实现了面向电子商务的云搜索引擎,给出典型分布式搜索器部件实验抓取数据的比对和分析,充分验证了云计算模式下搜索引擎的性能优越性和技术可行性。 本文所提出的基于搜索框/资源池的云计算模型在云搜索引擎的实际开发实现中得到了应用,具有一定的理论意义和工程实践价值。
[Abstract]:In this paper, the cloud computing model based on search box / resource pool is studied based on the research and development of supply chain cloud pre-research project of e-commerce platform. Based on the analysis of the current centralized search engine system,The main research work is to improve the function module of traditional search engine, analyze the parts of searcher, indexer and query in detail, and redesign it with distributed cloud computing technology.At the same time, using the idea of Map/Reduce programming model, the data computing task is encapsulated into the Map function, and the data merge task is encapsulated into the Reduce function.The improved search engine system can be deployed in the Hadoop distributed environment composed of cheap machines, and can significantly improve the precision, recall and response speed of the search engine system.Firstly, the paper gives the cloud computing framework based on search box / resource pool, and discusses the definition and characteristics of cloud computing, search box, search engine and resource pool.Then the key technologies of cloud computing, typical cloud computing platform, parallel workflow and parallel programming mode, resource pool and distributed file system are discussed. The application of distributed file system and Map/Reduce programming model in data processing is analyzed.At the same time, the indexing mechanism of inverted documents related to search engine, the principle of Chinese word segmentation and the scope of application of related tool software are discussed, and the improvement of parallelization of traditional search engine is also discussed.Then, according to the business function of each subsystem of search engine, the design idea of distributed search engine system is expounded.At the same time, it designs the function of three key parts of search engine: searcher, indexer and query, and gives the work execution flow of each part, and gives the formal description with BPEL business process description language.The Map/Reduce model is used to improve these three functional components so that they have the ability of distributed processing.Then, according to the function design and the internal execution process of distributed searcher, indexer and query unit, combined with the description of BPEL business process, the realization diagram of class relation is given, and the cloud distributed improvement of nutch search engine part in single machine running mode is carried out.The implementation of the system code programming, and the implementation of the process of the difficulties, and the solution.Finally, the development environment is built, and the cloud search engine oriented to electronic commerce is realized, and the comparison and analysis of the data captured by the typical distributed searcher components are given.It fully verifies the performance superiority and technical feasibility of search engine in cloud computing mode.The cloud computing model based on search box / resource pool proposed in this paper has been applied in the actual development and implementation of cloud search engine, which has certain theoretical significance and engineering practical value.
【学位授予单位】:南京航空航天大学
【学位级别】:硕士
【学位授予年份】:2012
【分类号】:TP391.3
【参考文献】
相关期刊论文 前8条
1 董守斌;赵铁柱;;面向搜索引擎的分布式文件系统性能分析[J];华南理工大学学报(自然科学版);2011年04期
2 何绍荣;鲜乾坤;;新型分布式Web Spider的设计[J];计算机工程与应用;2011年16期
3 詹恒飞;杨岳湘;方宏;;Nutch分布式网络爬虫研究与优化[J];计算机科学与探索;2011年01期
4 张伟哲;张宏莉;许笑;何慧;;分布式搜索引擎系统效能建模与评价[J];软件学报;2012年02期
5 曾剑平;吴承荣;龚凌晖;;面向分布式搜索引擎的索引库动态维护算法[J];山东大学学报(理学版);2011年05期
6 唐华姣;何友全;徐小乐;徐澄;;基于Lucene的分布式并行索引[J];计算机技术与发展;2011年02期
7 李远方;邓世昆;闻玉彪;韩月阳;;Hadoop-MapReduce下的PageRank矩阵分块算法[J];计算机技术与发展;2011年08期
8 吴文忠;易平;;MapReduce在分布式搜索引擎中的应用[J];计算机系统应用;2012年02期
本文编号:1697888
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1697888.html