互动问答社区中回答可信性分析

发布时间：2018-06-27 20:28

本文选题：互动问答社区 + 多字词表达　；参考：《北京信息科技大学》2013年硕士论文

【摘要】：近年来，随着Web2.0的发展，用户不仅是网页内容的浏览者，同时也是网页内容的编辑者，随之产生了大量的用户产生内容型（User Generated Content）的网络应用，互动问答社区（Question Answering Community）就是在此背景下产生的网络应用。互动问答社区的基本模式是用户根据自身的需求提出问题，由其他用户给出回答。在互动问答社区中，由于给出答案的用户具有多样性的特征，所以不同用户给出的回答可信性也高低不一，不同可信性的回答会对问题提问者和问题的浏览者产生重要的影响。因此，互动问答社区中问题回答的可信性判别成了问答社区主要的问题。基于此，本文主要针对互动问答社区中回答可信性分析进行研究，将课题研究分为三部分：互动问答社区问句中多字词表达抽取、互动问答社区中回答可信性分类、互动问答社区中最可信回答辨析。第一，互动问答社区中多字词表达抽取研究。对互动问答社区问句中多字词表达进行抽取主要应用于问句理解和构建可信信息库。基于互动问答社区问句中多字词表达的特点，提出适用于互动问答社区的多字词表达提取方法。该方法在利用互信息和停用词表的方法从问句中抽取候选多字词表达的基础上，将候选多字词表达分为正确串、残缺串、冗余串和错误串四类，借助搜索引擎对查询串的优化和候选多字词表达在互联网上的检索结果，设计了候选多字词表达校正方法，实现对多字词表达的提取。以新浪爱问知识人问题库里的问句进行实验，结果表明多字词表达抽取的准确率、召回率、F值分别达到了84%、52%、0.64，，具有较好的实验效果。第二，互动问答社区中回答可信性分类研究。针对互动问答社区中的特点，提出回答文本规范性特征和不确定性语气特征，从更多的角度对回答可信性进行分类。利用Logistic Regression模型，结合经典的文本特征、统计特征和用户特征，对回答可信性进行分析。以新浪爱问知识人中医疗与健康领域的问答对进行实验表明，在经典特征的基础之上，所提的回答文本规范性特征和不确定性语气特征能够较好提高回答可信性分类的准确率，验证了所提特征的有效性。第三，最可信回答辨析研究。提出了构建可信信息库的方法，并提出应用可信信息库与传统的问答对基本特征进行结合的最可信回答辨析思路，使得辨析结果得到了较大提高。选取可信问答对和与问题相关的可信资料作为可信信息库的主要内容，并设计了恰当的组织结构将这两部分联系起来，为可信信息库的使用提供了便利。提出了一种使用可信信息库的方法，并以实验验证了构建可信信息库对最可信回答辨析的有效性。应用本文提出的最可信回答辨析方法，使得最可信回答辨析达到了较好的实验效果。
[Abstract]:In recent years, with the development of Web 2.0, the user is not only the viewer of web content, but also the editor of web content. Question Answering Community is a network application under this background. The basic model of interactive Q & A community is that users ask questions according to their own needs and other users give answers. In the interactive Q & A community, due to the diversity of the users giving the answers, different users give different credibility of the answer, different credibility answers will have an important impact on the question questioner and the viewer of the question. Therefore, the credibility of the question-answering in the interactive Q & A community is the main question in the Q & A community. Based on this, this paper mainly focuses on the analysis of the credibility of answers in the interactive Q & A community, and divides the research into three parts: the extraction of multi-word expressions in questions in interactive Q & A community, and the classification of credibility of responses in interactive question-and-answer communities. The most credible answers in the interactive Q & A community. First, the study of multi-word expression extraction in interactive Q & A community. The extraction of multi-word expression in interactive question answering community is mainly applied to question comprehension and the construction of credible information base. Based on the characteristics of multi-word expression in question in interactive question community, a multi-word expression extraction method suitable for interactive Q & A community is proposed. On the basis of extracting candidate multi-word expressions from question sentences by using mutual information and stopping vocabulary, the method divides candidate multi-word expressions into four categories: correct string, incomplete string, redundant string and error string. With the help of search engine optimization of query string and retrieval results of candidate multi-word expression on the Internet, a candidate multi-word expression correction method is designed to extract multi-word expression. The experiment was carried out by using the question sentence in the question bank of Sina love to ask the knowledge person. The result shows that the accuracy of multi-word expression extraction is correct, and the recall rate of F is 84% 52% 0.64, respectively, which has good experimental effect. Second, the interactive Q & A community in the credibility of the answer classification. According to the characteristics of the interactive Q & A community, the normative features and the uncertain mood features of the answer text are proposed, and the credibility of the answer is classified from more angles. Based on Logistic Regression model, this paper analyzes the credibility of the answer by combining the classical text features, statistical features and user features. Based on the quizzes in the field of medical and health, the results show that, on the basis of the classical features, the normative features and uncertain mood features of the answer text can improve the accuracy of credibility classification. The validity of the proposed feature is verified. Third, the most credible answer discrimination research. In this paper, the method of constructing trusted information base is put forward, and the most credible answer that combines the trusted information base with the traditional question and answer is put forward, which improves the result of the discrimination greatly. The trusted question and answer pair and question related trusted information are selected as the main contents of the trusted information base, and the appropriate organization structure is designed to link the two parts together, which provides convenience for the use of the trusted information base. In this paper, a method of using trusted information base is proposed, and the validity of constructing trusted information base is verified by experiments. The most credible answer discrimination method proposed in this paper is used to make the most credible answer discrimination achieve better experimental effect.
【学位授予单位】：北京信息科技大学
【学位级别】：硕士
【学位授予年份】：2013
【分类号】：TP393.092;TP391.1

【参考文献】