基于信息需求的社区问答答案总结

发布时间：2018-03-15 01:32

本文选题：答案总结　切入点：信息需求　出处：《哈尔滨工业大学》2013年硕士论文　论文类型：学位论文

【摘要】：近年来，社区问答门户网站作为新兴的一种知识共享平台给问答系统带来了新的契机，为问答系统提供大量可用的问句及相应的答案信息。因其具有的交互性和开放性的特点，使它能够很好的满足用户的信息需求。大量的在线社区问答门户应运而生，其中包括百度知道，搜搜问问，天涯问答等等。用户获取知识的这一新手段，说明了传统搜索引擎仅仅基于关键词的查询方式已经不能满足用户准确快速搜索自己信息的需求。社区问答门户允许用户通过发帖的形式进行提问，提问内容中可以包括问题的内容以及对问题上下文语境进行补充的问题描述，所有用户都可以对这个问题进行回答，提问者也可以从答案中选择一个作为最佳答案。然而，由于用户对自己想要获取的信息领域并不熟悉，用户可能无法很好的组织自己的查询语言来帮助自己得到想要的信息需求。因此本课题将用户查询问句在社区问答门户上进行一个扩展，同时扩展的还有它们的问题描述信息和答案信息。将包含各种不同信息需求的问题集与答案集进行协同聚类，获得与用户查询问句相关的各个方面的信息需求。在不同的问题描述（上下文信息）之下的同一问题，它们产生的答案极可能是完全不同的。例如对于问题“如何买手机？”，当问题描述是“我应该到哪里去买”和“怎样才能买到便宜的”时，用户想要得到的信息需求是迥异的。为了解决这个问题，在协同聚类模型中引入了以问题描述为基础的约束条件。同时，社区问答回答者所提供的答案质量良莠不齐，有时在提供有用信息的同时可能附加了无用的信息甚至是错误的信息。这样高冗余的信息，难以应用于问答系统之中。本课题收集答案中的文本特征以及非文本特征，建立答案排序模型，通过协同聚类后的簇中排序好的信息来进行答案总结。大量的答案集中必然会包含有较多的重复信息，而这些回答可能存在表述上的区别而无法通过简单的相似度计算来进行检测。这样，本课题建立一种答案相似检测模型，通过层次多分类器投票的方法，来检测答案中存在的重复信息，并进行去除，最终得到正确的答案总结。
[Abstract]:In recent years, as a new knowledge sharing platform, the community Q & A portal has brought a new opportunity to the Q & A system, providing a large number of questions and corresponding answer information for the Q & A system, because of its interactive and open characteristics. So that it can very well meet the information needs of users. A large number of online community Q & A portal emerged as the times require, including Baidu know, search and ask, Tianya question and answer and so on. It shows that the traditional search engine only based on keywords can no longer meet the needs of users to search their own information accurately and quickly. The community Q & A portal allows users to ask questions through posting. The content of the question can include the content of the question and the supplementary description of the context of the question. All users can answer the question, and the questioner can choose one of the answers as the best answer. However, because users are not familiar with the field of information they want to access, Users may not be able to organize their own query languages to help them get the information they want. At the same time, the problem description information and the answer information are extended. The question set and the answer set, which contain different information requirements, are cooperatively clustered to obtain the information requirements of various aspects related to the user query question. The same question under different question descriptions (contextual information), they are likely to produce completely different answers. For example, for the question "how to buy a phone?" When the problem description is "where should I buy" and "how can I buy cheap", the information users want is very different. To solve this problem, The constraints based on problem description are introduced into the cooperative clustering model. At the same time, the quality of the answers provided by community question-and-answer respondents is mixed, and sometimes useful information may be accompanied by useless information or even false information. It is difficult to be applied to the question answering system. This paper collects the text features and non-text features of the answers, establishes the sorting model of the answers, and summarizes the answers through the sorted information in the cluster after cooperative clustering. A large number of answer sets are bound to contain more repeated information, and these answers may differ in expression and cannot be detected by simple similarity calculation. Through the method of hierarchical multi-classifier voting, the repeated information in the answer is detected and removed, and finally the correct answer summary is obtained.
【学位授予单位】：哈尔滨工业大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP391.1

【参考文献】