面向精准问答的数据处理的设计与实现
发布时间:2018-10-11 08:21
【摘要】:随着网络的迅速发展,使得互联网上的信息越来越多,人们越来越不能从海量的信息中获取对自己有用的信息。当用户使用搜索引擎进行搜索时,给出的结果往往是非常多的,用户还需要去自己甄别,这就使得简单的排列搜索结果并不能满足用户的需求。精准问答的出现可以为用户的搜索提供准确的答案,省去了用户自己去甄别的过程,为用户提供更好的搜索体验。精准问答中最重要的就是展现给用户的答案数据,对数据的准确性和实效性有很高的要求。所以提供的答案需要准确的数据来做支撑,这就需要建立一个非常完善的数据处理流程来获取精准问答答案数据。本文首先论述对精准问答数据处理的研究背景和意义,根据调研结果得出本文的研究内容。接下来根据研究内容对整个精准问答数据处理中涉及到的关键技术和原理进行了详细阐述,包括Web信息抽取技术、XML技术、流式计算平台、搜索引擎建立索引等。然后对精准问答数据处理进行了总体需求分析,并针对本文设计与实现的Web信息抽取、文档拆分、乐队成员生成三个部分做出了功能需求分析。接着对数据处理进行总体设计,给出了 Web信息抽取模块、文档拆分模块、乐队成员生成模块的概要设计和各模块详细设计与实现细节。最后说明了测试环境,分别对三个模块进行功能测试和性能测试,并对测试结果进行总结。
[Abstract]:With the rapid development of the Internet, more and more information on the Internet, people can not get useful information from the mass of information. When users use search engines to search, the results are often very many, and users still need to identify themselves, which makes the simple arrangement of search results can not meet the needs of users. The appearance of precise question and answer can provide the accurate answer for the user's search, obviate the process of the user's own discriminating, and provide the user with better search experience. The most important thing in the accurate question answering is to show the answer data to the user, which requires the accuracy and effectiveness of the data. Therefore, the answers are supported by accurate data, which requires the establishment of a very complete data processing process to obtain accurate question and answer data. This paper first discusses the research background and significance of accurate question and answer data processing. Then according to the research content, the key technologies and principles involved in the whole precise question and answer data processing are described in detail, including Web information extraction technology, XML technology, flow computing platform, search engine index and so on. Then, the general requirement analysis of precise question and answer data processing is carried out, and the functional requirements analysis is made for the three parts of Web information extraction, document splitting and band member generation, which are designed and implemented in this paper. Then the general design of the data processing is given, including the Web information extraction module, the document splitting module, the summary design of the band member generation module, and the detailed design and implementation details of each module. Finally, the test environment is described. The function and performance of the three modules are tested, and the test results are summarized.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1;TP393.09
本文编号:2263495
[Abstract]:With the rapid development of the Internet, more and more information on the Internet, people can not get useful information from the mass of information. When users use search engines to search, the results are often very many, and users still need to identify themselves, which makes the simple arrangement of search results can not meet the needs of users. The appearance of precise question and answer can provide the accurate answer for the user's search, obviate the process of the user's own discriminating, and provide the user with better search experience. The most important thing in the accurate question answering is to show the answer data to the user, which requires the accuracy and effectiveness of the data. Therefore, the answers are supported by accurate data, which requires the establishment of a very complete data processing process to obtain accurate question and answer data. This paper first discusses the research background and significance of accurate question and answer data processing. Then according to the research content, the key technologies and principles involved in the whole precise question and answer data processing are described in detail, including Web information extraction technology, XML technology, flow computing platform, search engine index and so on. Then, the general requirement analysis of precise question and answer data processing is carried out, and the functional requirements analysis is made for the three parts of Web information extraction, document splitting and band member generation, which are designed and implemented in this paper. Then the general design of the data processing is given, including the Web information extraction module, the document splitting module, the summary design of the band member generation module, and the detailed design and implementation details of each module. Finally, the test environment is described. The function and performance of the three modules are tested, and the test results are summarized.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.1;TP393.09
【参考文献】
相关硕士学位论文 前1条
1 李猛;基于DOM的Web信息抽取技术的研究与实现[D];大连理工大学;2008年
,本文编号:2263495
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2263495.html