科学术语本体构建的社会化方法
发布时间:2017-12-27 21:32
本文关键词:科学术语本体构建的社会化方法 出处:《中国科学技术大学》2016年博士论文 论文类型:学位论文
更多相关文章: 科学术语本体 社会化投票 LDA 主题层级 领域关键词表
【摘要】:一般来说,本体至少包含两个要素:领域概念和概念之间的关系。科学术语本体指的是,在一个科学领域里,由领域概念和概念之间的层级关系构成的一种简单形式的本体。科学术语本体在科研项目管理、研究评价(Research Assessment Exercise)等活动中扮演着极其重要的角色,因为科学术语本体能够准确地将一个科学领域里的资源做详细的分类,从而提高信息检索效率。例如,在中国国家自然科学基金委,近几年,平均每年都收到超过170,000份的基金申请书。平均来说,每个基金委的项目主任(Program Director)在不到三周的时间内,要负责超过1,500份申请书的项目评议专家指派工作。实践当中,大多数项目主任都采取这样的策略:先把项目申请书分组,然后指派项目评议专家。为了帮助项目主任快速地、宏观上把握所负责项目申请书的内容,从而提高分组效率,我们亟需构建科学术语本体。当前术语本体构建方法主要由两类:一类是手工方式构建,另外一类是自动构建。手工方式构建术语本体一般由领域决策者(Domain Decision Makers)主导,如基金委的管理人员、期刊编辑、本体工程师等。自动构建术语本体依赖于计算机算法处理自然语言。以质量和效率两方面作为标准来评价两类术语本体构建方法:手工方式构建的术语本体一般质量比较高,没有噪音数据,但是费时费力,并且对领域决策者的技能要求比较高。相比较而言,自动方式构建术语本体能够在短时间内处理大量数据,并且能及时更新,但是这样构建的术语本体质量较低,经常有噪音数据。为了兼顾质量和效率两方面,我们提出了第三种术语本体构建方法:社会化方式构建术语本体。社会化方式构建术语本体之所以可行,得益于我们所处的Web 2.0时代。各式各样的社会化媒体能够把人们方便地聚集在网络上协同工作。尤其是科研社交网络的兴起(如ResearchGate、科研之友等)能够使一个科学领域的学者跨越时间、空间交流。社会化方式构建术语本体的本质就是通过科研社交网络,鼓励一个科学领域的学者积极参与到术语本体的构建过程中去,从而减轻领域决策者的负担。综上所述,本文的研究问题是:如何以社会化的方式构建科学术语本体?构建一个科学领域的术语本体包含两个核心的任务:(1)构建领域关键词表;(2)生成关键词之间的层级关系。本文的研究目标包含以下三个方面:(1)提出一个社会化方式构建科学术语本体的统一可扩展的理论框架;(2)设计社会化投票方式构建领域关键词表的方法并实现;(3)设计以关键词相似度和专指度生成关键词层级关系的方法。在信息系统研究领域,行为科学(Behavioral Science)和设计科学(Design Science)是两个主要范式。行为科学致力于构建和检验理论(Theories),用以描述、解释或预测人和组织的行为,设计科学专注于创造和检验人工物(Artifacts),从而拓展人和组织的能力。本研究遵循设计科学研究方法。总体上,本文包含构造(Build)和评价(Evaluate)两个阶段。在构造阶段,我们首先提出了以社会化投票方式构建领域关键词表的方法,其次设计了集成了LDA主题模型和包容层次结构模型(Subsumption Hierarchy Model)的关键词层级结构生成方法。在评价阶段,我们首先通过问卷(Survey)的方式评价了以社会化投票方式构建领域关键词表的方法,其次,以实验(Experiment)的方法对关键词层级结构生成方法的LDA主题模型部分进行了评价,再次,以实验的方法对关键词层级结构生成方法的包容层次结构模型部分进行了评价,最后,以用户研究(User Study)的方法对整个术语本体构建方法进行了评价。在理论上本研究(1)提出了一个社会化方式构建科学术语本体的统一可扩展的理论框架;(2)设计了以社会化投票方式构建领域关键词表的方法;(3)设计了以关键词相似度和专指度生成关键词层级关系的方法。在实践方面,本研究提出的领域关键词表构建方法被应用于中国国家自然科学基金委的项目评审工作中。据我们了解,全国科学技术名词审定委员会每年都要耗费大量的人力、物力做技术名词规范工作,但大都用手工的方式,本研究为类似的组织提供了构建科学领域术语本体的备择方案。
[Abstract]:Generally speaking, the noumenon contains at least two elements: the relationship between the domain concept and the concept. The noumenon of scientific terms refers to a simple form of ontology formed by the hierarchy of concepts and concepts in a scientific field. The scientific term ontology evaluation on scientific project management, (Research Assessment Exercise) plays a very important role in the activities of scientific terminology because ontology can accurately be a science in the field of resources to do a detailed classification, so as to improve the efficiency of information retrieval. In China, for example, in recent years, the National Natural Science Foundation of China has received more than 170000 applications per year on average. On average, the Program Director of each fund committee is responsible for more than 1500 applications of project evaluation experts in less than three weeks. In practice, most project directors adopt such strategies: first group project applications, and then assign project experts. In order to help the project director to quickly and macroscopically grasp the content of the project application, so as to improve the efficiency of the group, we need to build the scientific terminology ontology. Currently, there are two main types of terminology ontology construction methods: one is constructed by hand, and the other is automatic. Manual construction of terminology ontology is generally dominated by Domain Decision Makers, such as fund managers, journal editors, ontology engineers, etc. Automatic construction of terminology ontology relies on computer algorithms for natural language processing. Two aspects of quality and efficiency are used as criteria to evaluate two kinds of terminology ontology construction methods: manually constructed noumenal ontology is generally of high quality and no noise data, but time-consuming and laborious, and has high skill requirements for domain decision makers. In contrast, automatic construction of term ontology can process large amounts of data in a short time and update in time, but the quality of noumenon constructed in this way is of low quality and often has noisy data. In order to take into account the two aspects of quality and efficiency, we have proposed third ways to construct the noumenon of terminology: the socialized way to construct the terminology ontology. The socialized way to build the terminology ontology is feasible, thanks to our Web 2 era. A variety of social media can easily gather people to work together on the network. In particular, the rise of scientific research social networks (such as ResearchGate, friends of scientific research, etc.) can enable scholars in a scientific field to cross over time and space. The essence of socialized way to build terminology ontology is to encourage a scientific scholar to participate in the construction of terminology ontology through scientific research social network, so as to lighten the burden of decision-makers in the field. To sum up, the research question in this paper is: how to build a scientific terminology ontology in a socialized way? To build a scientific term noumenon contains two core tasks: (1) building domain keywords list; (2) generating hierarchical relations between keywords. The goal of this paper includes the following three aspects: (1) proposed a unified theoretical framework to construct a scientific term ontology socialization mode can be extended; (2) the design of social voting method to build domain keyword list and implementation; (3) the design method of generating keywords hierarchy to keyword similarity and the specificity of the. In the field of information system research, Behavioral Science (Design) and Design Science (Design Science) are the main paradigms. Behavioral science is committed to building and testing theory (Theories), which is used to describe, explain or predict human and organizational behavior, design science to focus on creating and testing artifacts (Artifacts), so as to expand the capabilities of people and organizations. This study follows the design of scientific research methods. On the whole, this article contains two stages of structure (Build) and evaluation (Evaluate). In the construction stage, we first put forward a method of constructing domain keyword list based on social voting. Secondly, we designed a keyword hierarchy structure generation method which integrated LDA theme model and Subsumption Hierarchy Model. In the evaluation stage, we firstly through the questionnaire (Survey) of the evaluation methods of constructing, keyword tables to social voting, secondly, to experiment (Experiment) part of the LDA theme model of hierarchical structure keywords generation method of the method was evaluated again, experimental methods to the subsumption hierarchy model of keywords hierarchy the generation methods were evaluated, finally, to the user (User Study) of the method was evaluated for the term ontology construction method. In theory, this study (1) proposed a unified theoretical framework to construct a scientific term ontology socialization mode can be extended; (2) the design of the construction method of keyword table in social voting; (3) design method to generate keywords similarity and the specificity of the key words of the hierarchy. In practice, the construction method of domain keyword table proposed by this research is applied to the project evaluation work of the National Natural Science Foundation of China. According to our understanding, the national science and technology term Approval Committee consumes a lot of manpower and material resources to do technical nouns standardization work. But most of them use manual way, this study provides alternative programs for similar organizations to build ontology in scientific domain.
【学位授予单位】:中国科学技术大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP391.1
,
本文编号:1343262
本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/1343262.html
最近更新
教材专著