网络百科条目质量评价研究
发布时间:2018-03-18 11:11
本文选题:维基百科 切入点:质量特征 出处:《国防科学技术大学》2014年硕士论文 论文类型:学位论文
【摘要】:在二十一世纪,互联网的普及和应用带来了信息量的迅速增长。人们一方面受益于访问大量信息的便捷,另一方面却难以评判信息的质量好坏。维基百科就是这样的一个典型的案例。在维基百科上每天都有大量的条目在经历着创建、修改和编辑的过程,但是对条目的质量评判工作却跟不上条目变化的速度。维基百科采用基于人工的质量评判体系,使得评价质量的效率受到了很大限制。因此用户对于其中相当一部分的条目并不知道质量到底是好是坏。虽然维基百科也引入了用户评分的机制,但是受人的主观性影响评价效果并不好。本文针对维基百科上质量评价的效率和客观性存在的问题,对条目的质量评价进行了研究,开发了自动的质量评价方法。本文首先研究了维基百科上条目与质量相关的特征,从而为后续质量评价工作打下基础。一方面维基百科的高质量条目要具备传统百科条目所拥有的一般性特征,比如内容全面、信息准确等等;另一方面维基百科条目的撰写与传统百科存在着本质上的不同。维基百科采用了基于众包的编写模式,其条目的编写工作不是由少数专家编委完成,而是由众多编辑者一起参与完成的。所以条目的质量与其编辑历史有着密切的关联,我们也从其编辑历史中挖掘了特征。我们对条目内容和编辑历史中的特征进行了选取和分析,确定了部分能用机器评价的特征来作为我们研究的对象。在对条目质量进行评价时从分类和排序的两个角度展开。从分类的角度我们开发了一套基于SVM的分类方法来区分优质条目候选和普通条目。这样我们就能够从维基百科海量的条目中选取高质量的条目作为优质条目候选,从而克服人工评选的不足;另一方面我们研究了优质条目候选在评审阶段能否得到晋升的问题。我们发现机器的分类效果并不理想,因此不能完全替代人工评审的作用。接下来我们研究了对领域内的条目进行质量排序的方法。我们首先利用Page Rank模型对于条目和编辑者之间构成的二部图网络进行了建模。我们发现直接利用Page Rank来计算收敛状态下的排名时优质条目的排名比较靠后,质量排序效果不好。因此我们换了另外的思路。我们利用优质条目来度量编辑者的水平,进而实现了对于条目的质量排序。在此基础上,我们开发的基于编辑历史特征的加权算法能显著提高排序的效果。本文的研究以网络百科中的维基百科为典型代表,对于研究其他的网络百科以及众包网站上内容的质量评价也具有重要参考意义。
[Abstract]:In 21th century, the popularity and application of the Internet brought about a rapid increase in the amount of information. On the one hand, people benefited from easy access to a large number of information. On the other hand, it is difficult to judge the quality of information. Wikipedia is a typical case. There are a lot of items in Wikipedia every day that go through the process of creating, modifying and editing. But the quality of items doesn't keep up with the speed at which they change. Wikipedia uses an artificial system of quality evaluation. So users don't know whether the quality is good or bad, even though Wikipedia also introduces a user rating mechanism. But the effect of subjective evaluation is not good. This paper studies the quality evaluation of items in view of the problems of efficiency and objectivity of quality evaluation on Wikipedia. An automatic quality evaluation method is developed. Firstly, this paper studies the quality-related features of Wikipedia entries. On the one hand, the high quality items of Wikipedia should have the general characteristics of traditional encyclopedia items, such as comprehensive content, accurate information and so on. On the other hand, the writing of Wikipedia entries is fundamentally different from that of traditional encyclopedias. Wikipedia adopts a crowdsourcing approach, and the writing of its entries is not done by a small number of expert editors. So the quality of the entry is closely related to its editing history, We also excavated features from its editing history. We selected and analyzed the content of entries and features in editing history. Some of the features of machine evaluation can be used as the object of our research. When evaluating the quality of items, we develop a set of classifiers based on SVM from the perspective of classification and sorting. In order to distinguish between good entry candidates and regular entries, so we can select high quality entries from Wikipedia's mass of entries as candidates for high quality entries. In order to overcome the shortcomings of manual selection; on the other hand, we studied whether the candidates for quality items can be promoted in the review stage. We found that the classification effect of the machine is not satisfactory. Therefore, we can not completely replace the role of manual review. Next, we study the method of quality ranking of items in the domain. We first use the Page Rank model to make a bipartite graph network between entries and editors. We found that Page Rank was used directly to calculate the ranking of quality items in the convergent state. The quality sort effect is not good. So we change the way of thinking. We use the high quality item to measure the level of the editor, and then realize the quality sort of the item. On this basis, We developed a weighted algorithm based on editing historical features, which can significantly improve the effect of sorting. In this paper, Wikipedia in the network encyclopedia as a typical representative, It is also of great significance to study the quality evaluation of other online encyclopedias and crowdsourcing websites.
【学位授予单位】:国防科学技术大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP393.092;TP391.1
,
本文编号:1629332
本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1629332.html