面向问答社区的问题类型敏感的答案摘要算法研究
发布时间:2018-10-12 21:01
【摘要】:随着Web交互技术的快速发展,百度知道、Yahoo! Answer等问答社区中积累了大量的问答资源,为开放型问题的解决提供了新的途径。而问答社区中的问答对的可信度低、噪声多等质量问题给问答资源的再利用带来了很大的困难。如何从问答社区中挖掘高质量的资源是近年来网络问答社区的一个重要研究任务。很多研究通过答案质量评价方式获取高质量的问答对,,很少考虑开放型问题的单一答案不完整问题。本文从答案集合出发,以获取完整的、高质量的答案为最终目的,针对不同类型的问题的答案摘要算法进行研究。本文研究内容主要有如下3个方面: (1)为了针对不同类型的问题设计更有效的答案摘要算法,本文对于问答社区的问题分类进行了研究。首先,提出了一个面向问答社区的两层问题分类体系。其次,分析了事实型问题和问答社区问题的区别,在特征提取中除了事实型问题的词法特征和语法特征本文还引入问题的社区特征,并且分析了各种特征的分布特点。最后,通过增量式特征组合选取最佳特征组合,引入两阶段主动学习策略充分利用未标注样本提升问题分类效果。 (2)本文引入主题词表示答案,将传统的主题词抽取方法移植到答案的主题抽取中,通过使用主题词更好地表达答案的语义信息;针对答案的覆盖度、与问题的相关度以及内容质量等评价指标设计了量化方法,并且使用这些量化指标监督答案摘要过程;分析了各种类型问题的答案特点,结合答案的覆盖度、与问题相关度和内容质量设计了句子的打分函数,在此基础上分别提出了咨询型、观点型和调研型三类问题的答案摘要算法。实验结果表明本文提出的答案摘要算法对于摘要质量提升较大。 (3)本文实现了一个问题类型敏感的答案摘要系统,将前述的问答社区问题分类方法和答案摘要方法通过该系统进行融合,人们通过在该系统中检索问题就可以获取对应问题的答案摘要,大大提高了用户获取信息的效率。同时该系统对摘要的呈现方式也进行了改进,调研类问题以图表形式呈现给用户,观点型问题按照答案的情感极性分条显示,这都增强了答案摘要的可读性。
[Abstract]:With the rapid development of Web interactive technology, Baidu knows, Yahoo! A large number of Q & A resources have been accumulated in the Q & A community such as Answer, which provides a new way to solve the open problem. However, the reliability of Q & A in the Q & A community is low, and the quality problems, such as high noise, bring great difficulties to the reuse of Q & A resources. How to excavate high quality resources from Q & A community is an important task of online Q & A community in recent years. Many studies obtain high-quality question-answer pairs by evaluating the quality of answers, and rarely consider a single incomplete answer to an open question. Starting from the answer set and taking the complete and high quality answer as the ultimate goal, this paper studies the algorithms for different types of questions. The main contents of this paper are as follows: (1) in order to design a more effective answer summary algorithm for different types of questions, this paper studies the question classification of question answering community. Firstly, a two-level problem classification system for Q & A community is proposed. In addition to the lexical and grammatical features of the factual questions, this paper also introduces the community features of the problem, and analyzes the distribution of the various features. Finally, the best feature combination is selected by incremental feature combination, and the two-stage active learning strategy is introduced to make full use of unlabeled samples to improve the classification effect of the problem. (2) in this paper, the theme words are introduced to represent the answer. The traditional method of subject word extraction is transplanted to the topic extraction of the answer, and the semantic information of the answer is better expressed by using the theme word. This paper designs a quantitative method for evaluating indexes such as the relevance of the question and the quality of the content, and uses these quantitative indicators to supervise the summary process of the answer, analyzes the characteristics of the answers to various types of questions, and combines the coverage of the answers. The sentence scoring function is designed with the relevance of the question and the quality of the content. On this basis, three kinds of answer summarization algorithms are proposed, which are the consultation type, the viewpoint type and the research type respectively. The experimental results show that the algorithm proposed in this paper can improve the quality of the abstract. (3) A problem type sensitive answer summary system is implemented in this paper. Through the fusion of the above methods of question and answer classification and answer summary, people can get the answer summary by retrieving the questions in the system, which greatly improves the efficiency of the user to obtain the information. At the same time, the system also improves the presentation of abstracts, and the survey questions are presented to users in the form of charts. Opinion questions are displayed according to the emotional polarity of the answers, which enhances the readability of the abstracts.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.09
本文编号:2267580
[Abstract]:With the rapid development of Web interactive technology, Baidu knows, Yahoo! A large number of Q & A resources have been accumulated in the Q & A community such as Answer, which provides a new way to solve the open problem. However, the reliability of Q & A in the Q & A community is low, and the quality problems, such as high noise, bring great difficulties to the reuse of Q & A resources. How to excavate high quality resources from Q & A community is an important task of online Q & A community in recent years. Many studies obtain high-quality question-answer pairs by evaluating the quality of answers, and rarely consider a single incomplete answer to an open question. Starting from the answer set and taking the complete and high quality answer as the ultimate goal, this paper studies the algorithms for different types of questions. The main contents of this paper are as follows: (1) in order to design a more effective answer summary algorithm for different types of questions, this paper studies the question classification of question answering community. Firstly, a two-level problem classification system for Q & A community is proposed. In addition to the lexical and grammatical features of the factual questions, this paper also introduces the community features of the problem, and analyzes the distribution of the various features. Finally, the best feature combination is selected by incremental feature combination, and the two-stage active learning strategy is introduced to make full use of unlabeled samples to improve the classification effect of the problem. (2) in this paper, the theme words are introduced to represent the answer. The traditional method of subject word extraction is transplanted to the topic extraction of the answer, and the semantic information of the answer is better expressed by using the theme word. This paper designs a quantitative method for evaluating indexes such as the relevance of the question and the quality of the content, and uses these quantitative indicators to supervise the summary process of the answer, analyzes the characteristics of the answers to various types of questions, and combines the coverage of the answers. The sentence scoring function is designed with the relevance of the question and the quality of the content. On this basis, three kinds of answer summarization algorithms are proposed, which are the consultation type, the viewpoint type and the research type respectively. The experimental results show that the algorithm proposed in this paper can improve the quality of the abstract. (3) A problem type sensitive answer summary system is implemented in this paper. Through the fusion of the above methods of question and answer classification and answer summary, people can get the answer summary by retrieving the questions in the system, which greatly improves the efficiency of the user to obtain the information. At the same time, the system also improves the presentation of abstracts, and the survey questions are presented to users in the form of charts. Opinion questions are displayed according to the emotional polarity of the answers, which enhances the readability of the abstracts.
【学位授予单位】:哈尔滨工业大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.09
【参考文献】
相关期刊论文 前1条
1 金锋;黄民烈;朱小燕;;Guided Structure-Aware Review Summarization[J];Journal of Computer Science & Technology;2011年04期
相关博士学位论文 前1条
1 王宝勋;面向网络社区问答对的语义挖掘研究[D];哈尔滨工业大学;2013年
本文编号:2267580
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/2267580.html