深度搜索内网资源的研究与实现

发布时间：2018-05-19 08:06

本文选题：搜索引擎 + 信息检索　；参考：《电子科技大学》2013年硕士论文

【摘要】：随着时间流逝，技术也在迅猛发展。无数计算机领域的创新，都给参与的开发人员和用户带来了巨大的推动作用。信息的获取成为了人们从事计算机事业一个主要的研究方向。谈到如何获取信息，传统的搜索引擎已经为人熟知，互联网上信息资源也更多的集中在web中。然而在内网中，信息资源不单纯以web的形式被保存，它们更多的保存在各种类型的文档和数据库中，因此用户的需求变得更加多样和具体。仅仅将传统的搜索引擎应用到复杂的内网中是不够的。内网环境对安全性以及资源的全面性有着更高的要求。安全的以及全方位的搜索各种结构化和非结构化乃至半结构化的资源成为了内网资源的搜索的重点。基于web的传统资源搜索主要包括资源的爬行，索引的建立，检索以及结果的排序。内网资源搜索建立的步骤同其类似，但是同传统的网页搜索不同，内网资源的搜索要在安全性和深度上同传统搜索加以区分。传统的搜索方式对于访问策略没有加以规定。但是在特定的内网中，并不是所有用户搜索同一资源都会得到相同的结果。采用安全策略的搜索引擎需要根据用户的身份对结果进行掩饰，因此需要在搜索引擎中制定相应的安全策略。在对资源的爬行过程中，当访问到非web形式的资源时需要根据特定的接口将文件加以处理提取出文本并加以索引。这就是所谓的搜索上的深度要求。本文的主要工作包括以下几个方面：首先，对比传统的搜索引擎以及其模块的设计，给我们提高良好的理论基础，并帮助我们更一步的了解适用于内网资源搜索的软件所具备的基本功能以及实施的难点。阐述内网资源搜索引擎的各个主要模块的工作原理以及实现方案，包括文档的搜集，索引结构的建立以及搜索结果的呈现。其次，对安全策略以及深度搜索进行重点介绍，这两大关键突出的两大部分是系统设计的创新点所在。安全策略保证了信息的安全性，很好的适用于对权限要求较高的复杂的内网。深度搜索保证了信息获取的全面性并且给予了系统的良好的扩展性。最后，，对实验结果进行展示以及测试。总结内网资源搜索的意义并提出系统不足以及未来改进的思路。
[Abstract]:With the passage of time, technology is also developing rapidly. Numerous innovations in the computer field have brought a huge boost to the developers and users involved. The acquisition of information has become a major research direction for people engaged in computer business. When it comes to how to obtain information, the traditional search engine is already well known, and the information resources on the Internet are more concentrated in web. However, in the intranet, the information resources are not simply saved in the form of web, they are more stored in various types of documents and databases, so the needs of users become more diverse and specific. It is not enough to apply traditional search engines to complex intranets. The intranet environment has higher requirements for security and the comprehensiveness of resources. Secure and omnidirectional search for all kinds of structured, unstructured and even semi-structured resources has become the focus of the search of intranet resources. Traditional resource search based on web mainly includes crawling, index building, retrieval and result sorting. The procedure of intranet resource search is similar to that of traditional web search, but different from traditional web search, the search of intranet resource should be distinguished from traditional search in terms of security and depth. Traditional search methods do not specify access policy. But in a particular intranet, not all users search for the same resource and get the same results. The search engine adopting security policy needs to cover up the result according to the identity of the user, so it is necessary to formulate the corresponding security policy in the search engine. In the process of crawling resources, when accessing non-web resources, the files should be processed and indexed according to specific interfaces. This is called the search depth requirement. The main work of this paper includes the following aspects: first, compared with the traditional search engine and its module design, give us a good theoretical foundation, It also helps us to understand the basic functions and implementation difficulties of the software which is suitable for the search of intranet resources. This paper describes the working principle and implementation scheme of the main modules of the intranet resource search engine, including the collection of documents, the establishment of index structure and the presentation of search results. Secondly, the security policy and the depth search are introduced emphatically. The two key parts are the innovation of the system design. The security policy ensures the security of information and is well suited to complex intranets with high privilege requirements. The depth search ensures the comprehensiveness of information acquisition and gives the system good expansibility. Finally, the experimental results are displayed and tested. This paper summarizes the significance of intranet resource search and puts forward the ideas of system deficiency and future improvement.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP391.3

【参考文献】