基于生物医学数据的搜索系统的设计与实现

发布时间：2018-05-05 06:43

本文选题：Solr + 索引　；参考：《宁夏大学》2017年硕士论文

【摘要】：随着网络信息化的快速发展,人们越来越多地关注如何快速有效地从大量的网络信息中抽取出潜在的、有价值的信息。互联网搜索引擎应运而生,它解决了用户检索网络信息的难题,是人们获取信息极为有力的工具。当前对搜索引擎的研究是互联网技术中最为热门的领域之一。随着越来越深入的研究,搜索引擎技术正在不断的向前发展。同时,搜索引擎的设计与实现是一项庞大的工程,要运用较多专业技术知识,造成了搜索引擎研究开发的门槛较高,制约着搜索引擎的普及。Solr是开源的企业级搜索引擎服务器,它对外提供类似于Web-service的API接口。用户可以通过HTTP请求,向搜索引擎服务器提交一定格式的XML文件,生成索引;也可以通过HTTP Get操作提出查找请求,并得到XML格式的返回结果。其具备高效、独立等特点,其已经广受关注。本文研究并实现了一个基于生物医学数据的Solr搜索系统的实例。系统分为三大模块,分别是数据处理模块、索引模块、搜索模块。本系统通过对SolrJ的研究和使用,在Java EE架构的基础上,首先对生物医学数据进行分析和处理,将生物医学数据处理为计算机中通用的XML文档格式,使处理后的数据可以应用于Solr搜索系统中;然后通过研究SolrJ源码,对处理后的文档进行解析,将解析得到的内容提交到Solr搜索服务器进行分词和建立索引;在搜索模块中,在前端页面输入查询关键字,查询请求提交到Solr搜索服务器进行响应,响应结果返回到前端页面。
[Abstract]:With the rapid development of network information, people pay more and more attention to how to extract potential and valuable information from a large number of network information quickly and effectively. Internet search engine emerges as the times require. It solves the problem of searching network information for users and is a powerful tool for people to obtain information. At present, the research on search engine is one of the most popular fields in Internet technology. With more and more in-depth research, search engine technology is constantly developing. At the same time, the design and implementation of search engine is a huge project, to use more professional technical knowledge, resulting in a higher threshold of search engine research and development, restricting the popularity of search engines. Solr is an open source enterprise search engine server. It provides API interfaces similar to Web-service. The user can submit a certain format XML file to the search engine server through HTTP request and generate index, and can also make a search request through HTTP Get operation, and get the return result of XML format. Its high efficiency, independence and other characteristics, it has been widely concerned about. An example of Solr search system based on biomedical data is studied and implemented in this paper. The system is divided into three modules, data processing module, index module, search module. Through the research and use of SolrJ, based on the framework of Java EE, this system firstly analyzes and processes biomedical data, and then processes the biomedical data into the common XML document format in computer. The processed data can be applied to the Solr search system. Then, by studying the SolrJ source code, the processed documents are parsed, and the parsed content is submitted to the Solr search server for word segmentation and indexing. The query key is input in the front page, the query request is submitted to the Solr search server for response, and the response result is returned to the front page.
【学位授予单位】：宁夏大学
【学位级别】：硕士
【学位授予年份】：2017
【分类号】：TP391.3

【参考文献】