油田网络信息搜集与处理系统的设计与实现

发布时间：2018-01-22 18:47

本文关键词： 油田网络情报全文检索网络信息收集引擎　出处：《电子科技大学》2012年硕士论文　论文类型：学位论文

【摘要】：当今世界是一个信息社会，信息成为政府、企业最重视的内容，而互联网是一个巨大的网络系统，其中的信息浩如烟海，自然成了政府机关和企事业单位获取信息的主要来源。目前油田大多单位对互联网信息的利用还停留在比较简单的状态，，对互联网信息最常见的应用就是通过搜索引擎去找；但是这种方式有很大的局限性：首先是不相关的信息太多，干扰了对正常信息的查找；其次是新旧信息混杂，很难找到最新的信息；而且搜索到的信息没有经过分类整理，显得特别杂乱无章。为了满足油田政府机关和企事业单位对专业网络情报收集工作的要求，需要设计和实现油田网络情报收集系统。系统使用网络信息收集引擎（Network Information Collect Engine），将预先指定的多个网站的内容即时复制到本地的数据库，并利用全文检索（Full-Text Search）技术和文档相似性（Document Similarity）识别技术对信息库内的信息进行识别，把内容基本一致的文档归并到一起，从而更有利于用户检索和使用。系统的特色包括 1、搜集任务的灵活定制。 2、多目标数据源管理。 3、针对不同的目标数据源，进行不同的搜集配置，以确保能搜集到数据。 4、搜集任务调度管理，与目标站点同步更新，增量搜集。 5、搜集到数据结果，完成数据异构到同构的过程管理。 6、搜集结果发布管理。通过发布服务器，将数据发布到应用平台。
[Abstract]:Today the world is an information society, information has become the government, enterprises attach the most importance to the content, and the Internet is a huge network system, in which the information is vast. Naturally, it has become a major source of information for government agencies, enterprises and institutions. At present, the use of Internet information by most oil field units is still in a relatively simple state. The most common application of Internet information is to find it through search engines. But this kind of way has the very big limitation: first is the irrelevant information too many, interferes to the normal information the search; The second is the mixture of new and old information, it is difficult to find the latest information; And search for information has not been sorted out, it is particularly chaotic. In order to meet the requirements of oil field government and enterprises for the collection of professional network information, it is necessary to design and implement the oil field network information collection system. The system uses Network Information Collect engine. Instantly copy the content of prespecified multiple Web sites to a local database. Full-Text search technique and document Similarityis also used to search the document in full text. Recognition technology to identify the information in the information base. It is more convenient for users to retrieve and use documents that are basically consistent in content. Features of the system include 1. Flexible customization of collection tasks. 2, multi-objective data source management. 3, for different target data sources, different collection configuration to ensure that data can be collected. 4, collection task scheduling management, and target site synchronization update, incremental collection. 5, collect the data result, complete the data isomorphism process management. 6, collect the result release management. Through the publishing server, the data will be published to the application platform.
【学位授予单位】：电子科技大学
【学位级别】：硕士
【学位授予年份】：2012
【分类号】：TP311.52

【参考文献】