基于幂律分布与分形的Folksonomy层次知识网络提取研究
发布时间:2018-06-16 09:13
本文选题:Folksonomy + 关系频次 ; 参考:《东北师范大学》2017年硕士论文
【摘要】:自2004年被托马斯·范德·沃尔(Thomas Vander Wal)首次提出“Folksonomy”这一概念以来,Folksonomy知识组织模式被各种类型的资源网站用以组织架构网站资源。Folksonomy知识组织模式区别于其他传统知识组织体系,它是在现代开放语义网络环境下由用户个体自由参与标注,而并非由领域权威制定规则,因此表现出混沌离散的外在表象。也正是因为这一原因,在学术界也掀起了Folksonomy知识组织模式的研究热潮。目前的研究工作中,采用网络思维构建标签知识网络进行Folksonomy知识组织模式相关研究的方法已经被学术界接受并认可。由于Folksonomy知识组织模式采用社会化标注形式,因此相关的研究工作往往需要面临海量数据的处理。当面对巨量数据的分析研究时,在获得大数据思维带来的优势时,不得不同时考虑大数据所面临的“低价值”问题。毕竟开放的网络环境加上自由的社会化标注,使得Folksonomy知识组织模式中的社会化标签中充斥着大量的模糊的、歧义的、甚至错误的信息。一些相关的研究工作中往往由研究者自行设定阈值对数据进行筛选。尽管这种处理方式在一定程度上保障了数据的显著性和有效性,但同时也面临着其他问题。首先,阈值的设定缺少必要的理论保障。其次,根据阈值提取的数据与原始数据是否具有等效性。再次,当面临多个时段或多个类型问题的研究时是否具有可比性。因此,探索一种保障数据显著性的同时具有坚实的理论支撑,能够保障所提取的层次知识网络与原始知识网络等效,且具有一定可比性的层次知识网络提取方法成为学术界亟待解决的问题。本文采用德国Kassel大学的知识与数据工程小组架设与维护的系统BibSonomy为数据源,从中采集5组领域知识数据集。基于标签的同现关系,构建领域知识网络。对知识网络中关联关系的频度分布进行统计分析。在此基础上根据幂律分布与分形理论,基于知识关联频度设定阈值,提取知识层次网络。考虑到学术界的前期研究已经证实基于标签同现构建的领域知识网络的度分布具有幂律分布特征,而且网络具有小世界效应,因此研究中对所提取的层次知识网络主要从度值的幂律分布和网络小世界效应两个方面进行测试。研究结果表明,以知识关联频度为阈值提取的层次知识网络具有良好的幂律分布特征(无标度网络)和小世界效应,验证了层次知识网络与原始知识网络的等效性。因此,Folksonomy知识组织模式中,以知识关联频度为阈值提取的层次知识网络具有原始网络的整体性征。
[Abstract]:Since the concept of Folksonomy was first introduced by Thomas van der Walder Thomas Vander Walder in 2004, the knowledge organization model of Folksonomy has been used by various types of resource sites to organize the web site resources. The knowledge organization pattern of folksonomy is different from other traditional knowledge organization systems. In the modern open semantic network environment, the user is free to participate in tagging, not by the authority of the domain to make rules, so it shows the appearance of chaos discretization. It is for this reason that the research of knowledge organization mode of Folksonomy has been launched in academic circles. In the current research work, the method of using network thinking to construct tag knowledge network for Folksonomy knowledge organization pattern has been accepted and recognized by academic circles. Because Folksonomy knowledge organization pattern adopts the form of social tagging, the related research work often faces the processing of massive data. In the face of the analysis of huge amount of data, we have to consider the "low value" problem faced by big data when we get the advantage of big data thinking. After all, the open network environment and free social tagging make the social tags in the Folksonomy knowledge organization model full of vague, ambiguous, and even wrong information. In some related research work, researchers often set their own threshold to filter the data. Although this method ensures the significance and validity of the data to some extent, it also faces other problems. First of all, the threshold setting lacks the necessary theoretical guarantee. Secondly, whether the data extracted according to the threshold is equivalent to the original data. Third, whether there is comparability when facing multiple time periods or multiple types of problems. Therefore, to explore a method to ensure the salience of the data has solid theoretical support, which can guarantee the equivalence of the extracted hierarchical knowledge network with the original knowledge network. And a certain comparable hierarchical knowledge network extraction method has become an urgent problem in academia. BibSonomy, a knowledge and data engineering system set up and maintained by Kassel University in Germany, is used as a data source to collect five sets of domain knowledge data sets. Based on the cooccurrence relation of label, the domain knowledge network is constructed. The frequency distribution of association relation in knowledge network is analyzed statistically. Based on the power law distribution and fractal theory, the threshold is set based on the frequency of knowledge association, and the knowledge hierarchy network is extracted. Considering that previous studies in academia have proved that the degree distribution of domain knowledge networks based on label cooccurrence is characterized by power-law distribution and that networks have small-world effects, Therefore, the extracted hierarchical knowledge networks are mainly tested from the power law distribution of degree values and the network small world effect. The results show that the hierarchical knowledge network with knowledge association frequency as the threshold has good power law distribution (scale-free network) and small world effect, which verifies the equivalence between hierarchical knowledge network and original knowledge network. Therefore, in the knowledge organization pattern of Folksonomy, the hierarchical knowledge network, which is extracted from the frequency of knowledge association as the threshold, has the integrity of the original network.
【学位授予单位】:东北师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:O157.5
【参考文献】
相关期刊论文 前10条
1 李金海;何有世;马云蕾;;基于领域本体的在线评论信息层次化挖掘[J];系统工程;2016年10期
2 滕广青;常志远;刘雅姝;赵汝南;张利彪;;Folksonomy知识组织模式中领域知识动态演化规律研究[J];图书与情报;2016年04期
3 滕广青;杨明秋;田依林;黄微;;Folksonomy模式中的知识群落及其核心知识分析[J];图书情报工作;2015年22期
4 罗琳;梁桂生;蔡军;;基于分众分类法的图书馆书目推荐系统[J];现代图书情报技术;2014年04期
5 刘向;马费成;王晓光;;知识网络的结构及过程模型[J];系统工程理论与实践;2013年07期
6 刘海旭;郑岩;;基于语义的标签关联算法[J];软件;2012年12期
7 贾君枝;张宁;;社会标签的应用功能分析[J];情报理论与实践;2012年11期
8 苏晓萍;楼俊钢;;结合超图投影和随机游走的个性化推荐方法[J];情报学报;2012年08期
9 吴江;;自由分类标签类聚成网状分类结构研究与实现[J];图书情报知识;2011年01期
10 李超;;一种基于主题和分众分类的信息检索优化方法[J];情报理论与实践;2009年10期
,本文编号:2026162
本文链接:https://www.wllwen.com/kejilunwen/yysx/2026162.html