社区问答系统中的社团发现技术研究及其应用
发布时间:2018-08-28 20:10
【摘要】:社区问答系统(Community-based Question and Answering System, CQA)通过聚集大众智慧,能够免费提供问题的个性化解决方案,例如Yahoo! Answer,百度知道等。然而CQA系统无显式的社团结构,因此“社团”性质没能得到充分应用;而且CQA系统具有较高的开放性:知识内容共享和搜索引擎可接触,使得CQA系统易受到虚假账户的入侵,导致CQA账户行为规律复杂,知识质量急剧下降。 为解决CQA系统的上述问题,有必要深入研究系统中账户行为规律和网络性质。同时这些研究工作也有助于解决如下问题,例如相关用户推荐,相似问答内容融合,新型话题发现,虚假用户识别,个性化问答服务等,这些都能提高CQA系统中的知识质量。 本文以中国最大的CQA系统“百度知道”为代表,分析CQA系统中账户的行为规律。通过探索账户间的问答关系,本文构建两种网络模型,展示了CQA系统的基本网络性质。为检测CQA系统中的以兴趣为中心的账户社团,基于标签传播算法SLPA,我们提出一个面向CQA系统的社团发现算法MSLPA (Multilayer speaker-listener label propagation algorithm)。本文从网络规模、社团主题、聚合效果、层次结构等多方面评估MSLPA算法的性能,和已有的几种社团发现算法相比,MSLPA能够发现大规模CQA网络中有意义的、重叠的、具有层次结构的账户社团,避免生成大量的微型社团,有效聚合关联账户。 基于MSLPA社团发现技术,本文提出一个CQA系统中鉴别虚假账户的方法。首先给出一组具有较高区分度的账户属性集合,包括具有一定物理含义的账户个体属性和账户所属的社团性质,其中个体属性由统计分析得到,社团性质由本文的社团发现结果得到。本文将新提出的属性集合应用于简洁的J48决策树分类器上,判断账户为正常账户或者虚假账户。实验结果显示,该方法表现出良好的性能和效果,分类准确率得到较大的提高。
[Abstract]:Community Q & A (Community-based Question and Answering System, CQA) provides free personalized solutions to problems, such as Yahoo! Answer, Baidu knows wait. However, there is no explicit community structure in CQA system, so the nature of "community" has not been fully applied, and the CQA system is highly open: knowledge content sharing and search engine are accessible, which makes CQA system vulnerable to the invasion of false accounts. As a result, the behavior of CQA accounts is complicated and the quality of knowledge drops sharply. In order to solve the above problems of CQA system, it is necessary to deeply study the law of account behavior and the nature of network in the system. At the same time, these researches can also help to solve the following problems, such as related user recommendation, similar question and answer content fusion, new topic discovery, false user identification, personalized question and answer service, which can improve the quality of knowledge in CQA system. This paper takes Baidu know, the largest CQA system in China, as a representative to analyze the behavior of accounts in CQA system. By exploring the question and answer relationship between accounts, this paper constructs two kinds of network models and shows the basic network properties of CQA system. In order to detect the interest centered account community in CQA system, based on the tag propagation algorithm SLPA, we propose a community discovery algorithm MSLPA (Multilayer speaker-listener label propagation algorithm). For CQA system. This paper evaluates the performance of MSLPA algorithm in terms of network size, community theme, aggregation effect, hierarchical structure and so on. Account societies with hierarchical structure avoid generating a large number of microsocieties and effectively aggregate associated accounts. Based on MSLPA community discovery technology, this paper presents a method to identify false accounts in CQA system. First of all, a set of account attributes with higher degree of differentiation is given, including the individual attributes of accounts with certain physical meanings and the community properties of accounts, in which the individual attributes are obtained by statistical analysis. The nature of the community is obtained from the results of the community discovery in this paper. In this paper, the new attribute set is applied to the simple J48 decision tree classifier to judge whether the account is a normal account or a false account. The experimental results show that the method has good performance and effect, and the classification accuracy is greatly improved.
【学位授予单位】:中国科学技术大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
本文编号:2210486
[Abstract]:Community Q & A (Community-based Question and Answering System, CQA) provides free personalized solutions to problems, such as Yahoo! Answer, Baidu knows wait. However, there is no explicit community structure in CQA system, so the nature of "community" has not been fully applied, and the CQA system is highly open: knowledge content sharing and search engine are accessible, which makes CQA system vulnerable to the invasion of false accounts. As a result, the behavior of CQA accounts is complicated and the quality of knowledge drops sharply. In order to solve the above problems of CQA system, it is necessary to deeply study the law of account behavior and the nature of network in the system. At the same time, these researches can also help to solve the following problems, such as related user recommendation, similar question and answer content fusion, new topic discovery, false user identification, personalized question and answer service, which can improve the quality of knowledge in CQA system. This paper takes Baidu know, the largest CQA system in China, as a representative to analyze the behavior of accounts in CQA system. By exploring the question and answer relationship between accounts, this paper constructs two kinds of network models and shows the basic network properties of CQA system. In order to detect the interest centered account community in CQA system, based on the tag propagation algorithm SLPA, we propose a community discovery algorithm MSLPA (Multilayer speaker-listener label propagation algorithm). For CQA system. This paper evaluates the performance of MSLPA algorithm in terms of network size, community theme, aggregation effect, hierarchical structure and so on. Account societies with hierarchical structure avoid generating a large number of microsocieties and effectively aggregate associated accounts. Based on MSLPA community discovery technology, this paper presents a method to identify false accounts in CQA system. First of all, a set of account attributes with higher degree of differentiation is given, including the individual attributes of accounts with certain physical meanings and the community properties of accounts, in which the individual attributes are obtained by statistical analysis. The nature of the community is obtained from the results of the community discovery in this paper. In this paper, the new attribute set is applied to the simple J48 decision tree classifier to judge whether the account is a normal account or a false account. The experimental results show that the method has good performance and effect, and the classification accuracy is greatly improved.
【学位授予单位】:中国科学技术大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092
【参考文献】
相关期刊论文 前2条
1 李晨;巢文涵;陈小明;李舟军;;中文社区问答中问题答案质量评价和预测[J];计算机科学;2011年06期
2 毛先领;李晓明;;问答系统研究综述[J];计算机科学与探索;2012年03期
,本文编号:2210486
本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/2210486.html