当前位置:主页 > 科技论文 > 搜索引擎论文 >

基于都柏林核心(DC)的中医文献元数据标准研究

发布时间:2018-04-21 05:03

  本文选题:中医文献元数据 + 中医古籍 ; 参考:《中国中医科学院》2013年硕士论文


【摘要】:中医在数千年的发展长河中,形成了异彩纷呈的医学流派,留下了浩如烟海的中医古今文献。中医作为珍贵的文化遗产,中医文献起到了文化传承的纽带作用,记载着几千年的医家智慧、医学经验。面对庞大文献资源,使用者需要高效率的检索利用和知识发掘;文献管理者要对其进行分类整理、权利管理、资源评鉴、妥善保存。中医文献从其产生、传递、储存、到最终消失的过程中有着收集、加工、利用等方面的诸多环节,其生命周期中蕴含着大量信息,这些在文献形成和利用中经历的过去事实和文献资源本身的内容一样都具有一定的记录价值。人们致力于研究如何在海量中医文献中按特定需求进行查询检索,并要尽量返回更为精确的结果。不该仅仅依靠传统搜索引擎那样只用关键词机械匹配,而不考虑语义,机械式的查询往往会返回空白、错误或是不符合意图的结果。元数据的应用,对这个问题的解决起到了很大的推动作用。 2008年10月,中国中医科学院中医药信息研究所开始参与ISO/TC215传统医学信息标准化工作,对ISO/TC215传统医学信息标准化的动态进行了密切关注,并且在ISO/TC215中提交中医药信息国际标准提案“中医数据集元数据标准”,10月份又将“中医数据集元数据标准”改为“中医文献元数据标准”。ISO/TC215已于2011年11月对“中医文献元数据标准”提案启动投票程序。2012年5月ISO对“中医药文献元数据”这项标准正式立项。这是我国中医药信息标准在ISO中首次立项。 本课题是针对已立项的中医文献元数据标准着重从的设计原则、制定方法切入,依据原则与方法建立一套完整的中医文献元数据标准体系。 首先完成了中医药标准发展的研究背景调查,调研国外已有的医学元数据,对国内医学元数据研究现状在图书情报和互联网的范围内进行调研。 描述信息资源的元数据有描述传统印刷型文献的MARC格式,也有描述网络信息资源的DC元数据,还有一种介于MARC和DC之间的第三种元数据——MODS;描述医学信息的元数据有:ISO的健康信息学技术委员会研制的"ISO13119Health informatics-Clinical knowledge resources-Metadata(健康信息学-临床知识资源-元数据)”标准、Ohio LINK医学元数据、美国Oregon Health Sciences University制定的医学核心元数据MCM、法国Rouen University Hospital(RUH)1995年发起的基于质量控制的主题网关项目:CISMeF等。已发布的这些医学元数据标准都是很大程度的参考或复用了DC。 用学术联机数据库检索和互联网检索相结合的方式,发现国内在元数据领域已有较多研究,但涉及医学领域的元数据研究非常少,而中医领域更是鲜有问津。国内缺乏权威部门牵头并起草、正式发布的医学元数据标准,使中医文献共享缺乏有力支撑,因此本课题有一定研究与开发的必要性。 第二,从资源利用、保护等方面分析中医文献元数据标准的研究目的与意义,总结元数据的功能,分析基于DC设计新的元数据的原因。 中医文献收藏地点分散,现存1949年以前的12000多种中医文献目前分散保存在全国各专业图书馆,仍作为各馆的镇馆之宝而束之高阁。学者们在文献整理研究各个工作环节上依然沿用手工作业的方式。随着计算机技术应用在文献管理领域的延伸以及扫描技术的发展,文献的电子化处理给读者带来极大的阅读便利;古老的文献在重建天日的同时能够获得很好的保护。国内已有或规划中的很多中医文献检索平台和数据库。与文献利用的信息技术的发展形成对比的是文献利用理论支撑的相对滞后。元数据的标准化是文献利用理论建设的重要环节,中医文献元数据为中医药文献资源的规范化描述奠定了基础,它有助于构建明晰、周全、简单、易懂的文献描述性记录,能有效支持中医药文献的收集、保管和利用,改善中医药文献检索的效果,对于中医药文献资源的系统保护和深度利用具有重要意义。元数据基于DC设计可以避免MARC格式中大量繁琐的定长字段,使得编目界面变得简洁而直观,无论是专业编目员还是非专业编目员,都可以参与文献编目工作,这使编目工作更能适应对庞大的网络化信息资源的组织。 第三,设计元数据框架,分析元数据方案设计的通用原则和具体原则,规划中医文献元数据的设计流程。 根据中医文献生命周期的各项活动和描述角度的不同,将中医文献元数据划分为7个元数据子集: (1)标识信息子集:外部特征的基本信息,包括名称,标识符,创建者和出版者,等等。 (2)内容信息子集:关于中医文献内部特征的描述信息,包括描述,主题,等等。 (3)分发信息子集:关于用户获取和收藏文献资源的信息。 (4)质量信息子集:关于文献资源保存状态的质量信息。 (5)限制信息子集:对资源和元数据获取和使用的限制信息。 (6)维护信息子集:关于维护保养文献资源的信息。 (7)关联信息子集:提供了资源之间关联关系的参考信息。 总结了设计元数据标准6条通用原则:(1)简单性与适用性原则;(2)专指度与通用性原则;(3)互操作性与易转换性原则;(4)灵活性与可扩展性原则;(5)用户需求原则;(6)遵循现有标准原则。 除了通用原则,针对具体领域元数据的制定归纳了条具体原则:(1)资源分析原则(2)扩展原则(元素扩展原则和修饰限定原则)(3).元素定义原则(4)置标原则 第四,进行本文中医文献元数据的相关资源分析,对著录对象和著录单位等提出了细节性的界定。 “文献”采用广义的定义;除中医外,传统医学文献也可适用于此元数据;当实体文献资源数字化后,需对数字化文本或影像等格式的文献资源以及实体本身属性进行著录,二者结合不可分离;具体著录单位要按实际需要确定。 第五,完成了中医文献元数据的元素集、元素定义及著录规则的具体描述,并用摘要和字典两种形式进行呈现。 元素集及其限定词的摘要展示于下表: 中医文献元数据保留了DC的元数据元素集,又包括中医药领域的特征元素。 重用DC元数据元素,如题名(Title)、类型(Type)、创建者(Creator)、主题(Subject)、描述(Description)、日期(Date)、标识符(Identifier)、语种(Language)、关联(Relation)等; 根据中医药领域特性,对DC元数据元素进行细化,例如将DC中的题名(Title)进一步细化为版心题名(Title on the Fore-edge)、内封题名(Title on the Inside Cover)、书衣题名(Title on the Book Cover)、卷端题名(Title on the First Page of Text)等; 添加具有中医药特色的元数据元素,例如历代医家、医学流派等等。 第六,选择合适的网络描述语言作为本元数据的置标语言,实现元数据的网络应用功能。 RDF (Resource Description Framework),即资源描述框架,是一种用于描述Web资源的标记语言。RDF使用XML语法和RDF Schema (RDFS)来将元数据描述成为数据模型。RDF三元组数据模型包括的三种对象类型: ●资源(Resource)。RDF编码中描述的所有事物都称为资源。 ●属性(Property)。属性是用来描述资源的外部特征、内容说明或资源间相互关系。 ●陈述(Statement)。陈述是用特定模式的语句将资源的属性及其值表达出来。陈述语句可以和自然语言语句相对应,资源(Resource)对应于自然语言中的主语(Subject),属性(Property)对应于谓语(Predicate),属性值(Value)对应于宾语(Object)。 第七,通过对比本元数据和国际权威元数据临床知识资源元数据标准HICKR,讨论本元数据的唯一性和不可替代性。 最后,总结本研究的主要工作,对中医文献元数据的应用前景做出展望。
[Abstract]:TCM is a precious cultural heritage . Traditional Chinese medicine literature plays a role of cultural heritage , which records thousands of thousand years of wisdom and medical experience . Facing the huge literature resources , users need high - efficiency search and utilization and knowledge discovery ;
There are many links in the process of collection , processing and utilization in the process of the formation , transmission , storage and eventual disappearance of Chinese medical literature .

In October 2008 , the Chinese Medical Information Institute of Chinese Academy of Traditional Chinese Medicine began to participate in the standardization of traditional medical information of ISO / TC215 . It has paid close attention to the standardization of ISO / TC215 traditional medical information . In October 2011 , the International Standard of Chinese Medicine Data Collection Metadata Standard was changed to " Chinese Medical Document Metadata Standard . " ISO / TC215 started the voting procedure on the proposal of " Chinese Medical Document Metadata Standard " in November 2011 . This is the first entry of the standard of Chinese medicine information in ISO in May 2012 .

The subject is to set up a complete standard system of TCM literature metadata according to the principles and methods , aiming at the design principle and the method of establishing the standard of TCM literature metadata .

Firstly , the research background of the standard development of Chinese medicine is completed , and the existing medical metadata is researched , and the present situation of domestic medical metadata research is investigated in the range of book information and Internet .

The metadata of descriptive information resources describes the MARC format of traditional printed documents , DC metadata describing network information resources , and a third metadata _ MODS between MARC and DC . The metadata of medical information is : " ISO13119Health Science - Clinical Knowledge resources - Metadata " developed by ISO ' s health informatics technology committee , Ohio LINK medical metadata , quality - controlled theme gateway project initiated in 1995 by Rouen University Hospital ( RUH ) , etc . These medical metadata standards have been published to a large extent with reference to or multiplexed DC .

With the combination of academic online database retrieval and Internet search , it has been found that there are many researches in the field of metadata , but the research on metadata in medical field is very few , and the field of Chinese medicine is more and more intensive . There is no authoritative department in the country and drafting and officially releasing medical metadata standard , which makes the sharing of TCM literature lack of strong support . Therefore , the subject has some research and development necessity .

Second , from the aspects of resource utilization , protection and so on , the research purpose and significance of the metadata standard of TCM literature are analyzed , the function of metadata is summarized , and the reason of the new metadata based on DC design is analyzed .

There are more than 12,000 traditional Chinese medicine documents which were dispersed in various professional libraries throughout the country in 1949 , and still serve as the treasure of the library of the various museums . The scholars still use manual operation in the research of the literature . With the development of the computer technology in the field of document management and the development of scanning technology , the electronic processing of the documents brings great convenience to the readers ;
The standardization of metadata is the important link of literature utilization theory . The standardization of metadata is the important link of literature utilization theory .

Thirdly , the metadata framework is designed , the general principles and specific principles of metadata design are analyzed , and the design flow of the metadata of TCM literature is planned .

According to the various activities and description angles of the life cycle of TCM literature , the metadata of TCM literature is divided into 7 metadata subsets :

( 1 ) Identification information subset : basic information of external features , including name , identifier , creator and publisher , etc .

( 2 ) subset of content information : description information about the internal characteristics of the traditional Chinese medicine literature , including description , subject , and so on .

( 3 ) Distribution information subset : information about user acquisition and collection of document resources .

( 4 ) subset of quality information : quality information about the preservation status of document resources .

( 5 ) Restriction information subset : restriction information about the acquisition and use of resources and metadata .

( 6 ) Maintenance information subset : information about maintenance document resources .

( 7 ) Correlation information subset : provides reference information of the relationship between resources .

The general principles of design metadata standard are summarized as follows : ( 1 ) the principle of simplicity and applicability ;
( 2 ) the principle of specificity and universality ; ( 3 ) the principle of interoperability and accessibility ;
( 4 ) Principle of flexibility and scalability ;
( 5 ) User requirement principle ;
( 6 ) Compliance with existing standard principles .

In addition to general principles , specific principles are summarized for the formulation of metadata in specific areas : ( 1 ) the principle of resource analysis ( 2 ) ( 2 ) the principle of expansion ( element extension principle and modification definition principle ) ( 3 ) . Element definition principle ( 4 ) setting principle

Fourthly , the related resource analysis of the literature metadata of the traditional Chinese medicine is carried out , and the detailed definition is put forward for the description object and the description unit , etc .

" Literature " is defined in a broad sense ;
Besides traditional Chinese medicine , the traditional medical literature can also be applied to this metadata ;
When the entity document resources are digitized , the document resources in the format such as digital text or image and the attribute of the entity itself need to be recorded , and the combination of the two entities is not separable ;
The specific directory units shall be determined according to the actual needs .

Fifth , the element set , the element definition and the description of the well - known rules of the TCM literature metadata are completed , and presented in two forms of abstract and dictionary .

The summary of the element set and its qualifier is shown in the following table :

TCM literature metadata preserves the metadata element sets of DC , and also includes the feature elements in the field of traditional Chinese medicine .

Reuse of DC metadata elements , such as Title , Type , Creator , Subject , Description , Date , Identifier , Language , Relation , and so on ;


According to the characteristics of the traditional Chinese medicine field , the DC metadata elements are thinned , such as title on the title - edge , Title on the Inside Cover , Title on the Book Cover , Title on the First Page of Text , and the like ;


Add metadata elements with traditional Chinese medicine characteristics , such as medical experts , medical schools and so on .

and sixthly , selecting an appropriate network description language as a markup language of the metadata , and realizing the network application function of the metadata .

RDF ( Resource Description Framework ) , a resource description framework , is a markup language for describing Web resources . RDF uses XML syntax and RDF Schema ( RDFS ) to describe metadata as a data model . The RDF triple data model includes three object types :

Resource . All things described in RDF coding are called resources .

Property . An attribute is used to describe the external features , content descriptions , or inter - resource relationships of a resource .

A statement is a statement that expresses the property of a resource and its value with a specific schema statement . A statement statement can correspond to a natural language statement . A resource corresponds to a subject in a natural language , and the Property corresponds to a predicate , and the attribute value corresponds to object .

Seventh , the uniqueness and non - substitutability of this meta - data are discussed by comparing the metadata of this metadata and the metadata standard of the international authoritative metadata clinical knowledge resource metadata .

Finally , the main work of this study is summarized , and the prospect of the application of TCM literature metadata is forecasted .

【学位授予单位】:中国中医科学院
【学位级别】:硕士
【学位授予年份】:2013
【分类号】:R-05

【参考文献】

相关期刊论文 前10条

1 任磊;谭跃生;;基于RDF元数据的网格资源统一描述方法[J];内蒙古科技大学学报;2009年02期

2 肖珑,陈凌,冯项云,冯英;中文元数据标准框架及其应用[J];大学图书馆学报;2001年05期

3 金毅,王绍平;元数据在电子化学位论文中的应用探讨[J];大学图书馆学报;2002年02期

4 李鹏云,陈奕;试论MARC元数据向DC都柏林核心元数据的转换[J];新世纪图书馆;2005年02期

5 吕精巧;宋智忠;郭兆红;;网络环境下数字图书馆的安全问题研究[J];科技情报开发与经济;2009年22期

6 王伟;;近年来我国DC元数据研究文献综述[J];图书馆理论与实践;2007年05期

7 马珉;元数据——组织网上信息资源的基本格式[J];情报科学;2002年04期

8 李慧;元数据在数字图书馆中的应用[J];情报理论与实践;2001年01期

9 王汉元;置标语言以及SGML、HTML和XML的关系[J];情报杂志;2005年03期

10 陶兰,杨睿,陈冲,孙晓明;面向语义Web的RDF数据处理和应用[J];深圳大学学报;2005年04期

相关硕士学位论文 前1条

1 刘振华;视频文件元数据的设计与开发[D];山东大学;2009年



本文编号:1781021

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/sousuoyinqinglunwen/1781021.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户9cb15***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com