当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于搜索引擎的中文自动问答系统的设计与实现

发布时间:2017-12-28 10:40

  本文关键词:基于搜索引擎的中文自动问答系统的设计与实现 出处:《北京工业大学》2016年硕士论文 论文类型:学位论文


  更多相关文章: 问答系统 Site Q 多特征融合 语义依存树 答案抽取


【摘要】:我们处于一个信息极其丰富的时代,人们对于快速准确地获取信息产生极大的需求。搜索引擎以其使用方便、反应迅捷而备受人们欢迎,成为信息获取的最主要方式。然而搜索引擎以关键词的检索方式很难清晰表达用户的意图,并且以网页集合的方式返回结果仍然需要用户自己手动查找答案。因此自动问答系统应运而生,但是传统的自动问答系统存在信息覆盖不全面、更新不及时的缺陷,并且需要维护一个庞大的知识库。为了发挥两者的优势,本文拟将两者结合起来,设计与实现一个改进的基于搜索引擎的中文自动问答系统。本文的主要工作如下:(1)改进Site Q算法,提出Topic-Site Q算法:首尾段落和首尾语句对语义有着较大的贡献,本文将其以恰当权重融入Site Q算法,提出关联首尾段落和首尾语句的多特征融合段落检索算法Topic-Site Q:采用多特征融合的算法计算首尾语句的语义相似度,并以一定的权值体现它们对段落相关度的贡献,同时提高首尾段落的评分值,最后根据该评分值进行段落排序并返回候选段落集。(2)改进基于语义依存树的答案抽取算法:基于语义依存树的答案抽取算法主要对语义和语法结构进行考察,考察方式单一不够全面。词频作为重要的语义特征之一,应该在答案抽取算法中体现该特征。本文对基于语义依存树的答案抽取算法进行改进,将关键词出现的频率考虑进去,利用对数线性模型将两者融合在一起,提出改进的基于语义依存树的答案抽取算法。(3)设计并实现了一个改进的基于搜索引擎的中文自动问答系统,并根据改进的两个算法对该系统进行优化。先是详细的分析了系统的需求,然后描述系统的总体结构并给出系统结构图。在详细设计与实现部分,分模块详细论述各个模块的功能、处理流程、实现细节以及使用的核心算法及其改进。(4)为了验证提出的优化方法的有效性,人工构建问题测试集对算法和系统的改进效果进行实验,计算两个算法改进前后以及系统使用改进后的算法的MRR值、查准率、召回率和F1值并进行对比分析。实验结果表明,算法的改进效果良好,使用改进的算法后系统性能有所提高。
[Abstract]:We are in an era of extremely rich information, and people have a great demand for fast and accurate access to information. Search engine is popular for its convenience and quick response, and it has become the most important way of information acquisition. However, it is hard for search engines to express users' intentions clearly in keyword search mode, and to return results by web page collection still requires users to manually find answers. Therefore, the automatic question answering system arises at the historic moment, but the traditional automatic question answering system has the defects of incomplete information coverage and untimely updating, and it needs to maintain a huge knowledge base. In order to give full play to the advantages of the two, this paper will combine the two, design and implement an improved Chinese automatic question answering system based on search engine. The main work of this paper are as follows: (1) the improved Site Q algorithm, Topic-Site Q algorithm is proposed: the following paragraphs and end statement has a greater contribution to the semantic, the proper weight into the Site Q algorithm, the multi feature association end and end statement fusion paragraph retrieval semantic similarity algorithm using multiple Topic-Site Q: feature fusion algorithm and statements, and with a certain value reflects their relevance to the paragraph contribution, while improving the end value of the score, according to the final score value and return paragraph ranking candidate segment set down. (2) improve the answer extraction algorithm based on the semantic dependency tree: the answer extraction algorithm based on the semantic dependency tree mainly investigates the semantic and grammatical structure, and the single way is not comprehensive enough. Word frequency, as one of the important semantic features, should embody this feature in the algorithm of answer extraction. In this paper, we improve the answer extraction algorithm based on semantic dependency tree. We take into account the frequency of keywords. We use logarithmic linear model to integrate them, and propose an improved algorithm based on semantic dependency tree for answer extraction. (3) an improved Chinese automatic question answering system based on search engine is designed and implemented, and the system is optimized according to the improved two algorithms. First, it analyzes the requirements of the system in detail, then describes the overall structure of the system and gives the structure of the system. In the detailed design and implementation part, the functions of each module, the process of processing, the implementation details, the core algorithms used and their improvements are discussed in detail. (4) in order to optimize the effectiveness of the proposed method is verified, artificial construct test set improvement effect on the algorithm and system experiments were conducted before and after the improvement of system using the improved algorithm and two algorithms for computing the MRR value, the precision, recall and F1 value and carries on the contrast analysis. The experimental results show that the improved effect of the algorithm is good, and the performance of the system is improved after the improved algorithm.
【学位授予单位】:北京工业大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:TP391.3


本文编号:1345620

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1345620.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户f5135***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com