当前位置:主页 > 医学论文 > 药学论文 >

药物不良事件信息资源整合与数据挖掘研究

发布时间:2018-02-24 11:01

  本文关键词: 药物本体 药物不良事件 数据挖掘 映射 聚合 模型 抗癌药物 分类 出处:《吉林大学》2014年博士论文 论文类型:学位论文


【摘要】:目前,药物不良事件的发生日益成为一个严重的公共卫生问题。药物上市前虽然经过严格的不良事件实验研究,但仍不能够发现所有的潜在不良事件。在20世纪60年代“反应停”(thalidomide)事件之后,许多国家引入了药物警戒(phamacovigilance)系统对上市药品进行监测。美国药品与食品管理局(Food andDrug Administration,FDA)的药物不良事件报告系统(Adverse Event ReportingSystem,AERS)数据库主要用于发现那些在临床试验阶段由于出现频次低而没有被识别出的罕见严重不良事件,或者新的药品不良事件,即安全性信号。如果在AERS中发现药物潜在的安全问题,FDA将进行流行病学研究以进一步评价该不良事件,确定药物与不良事件之间的因果关系。基于对药物不良事件的安全评价,FDA可能采取一系列的法规调整以提高产品安全及保障公众健康,如更新药品说明信息,限制使用药品,向公众介绍新的安全相关信息,或在少数情况下,从市场上撤销该药品。 当前,关于不良事件数据挖掘的多数研究都集中于利用小部分数据进行不良事件的数据挖掘,避免对大规模数据的利用和研究;对药物不良事件进行药物的作用机制、药代动力学及生理作用等方面的深度挖掘的研究,以及对某类药物的不良事件进行比较性数据挖掘的研究、进行药物的作用机制、药代动力学及生理作用等方面的深度挖掘的研究,以及对某类药物的不良事件进行比较性数据挖掘的研究、AERS与其他数据源的集成难以实现难以实现,而这类大规模、深层次的挖掘在揭示不同药物类别的不良事件特征、药物不良事件的原因以及基因相关性方面都具有重要意义,是药物不良事件监测乃至临床用药安全研究的重要方向。缺乏对药物不良事件相关数据资源的知识整合严重限制了上述研究。 药物不良事件信息资源的知识整合既是有效利用海量医疗信息资源的现实需求,也是提高药物不良事件数据挖掘效率所需认真研究并必须解决的关键问题。近年来药物领域本体的发展虽然为资源整合研究提供了实现契机,然而由于药物领域本体的复杂性、数据缺乏规范化以及领域本体映射的技术难题,药物不良事件领域数据在知识集成与深度聚合方面始终未能求得理想的解决方案,药物不良事件的数据挖掘也因此未能扩展到对大规模数据的利用和分析。 领域本体可以提供相关知识决策和推理支持,促进大规模药物安全信号的检测和药物不良事件的深度挖掘。本研究利用生物医学领域本体将AERS相关信息资源有机整合起来,实现知识集成、信息聚合、与其他医疗数据资源之间的互操作、丰富了药物不良事件数据挖掘的资源并促进对药物安全信号的检测。 本研究的主要内容包括: (1)提出药物领域本体映射与聚合模型 实现本体映射以及对药物信息的分类与聚合将为药物相关知识决策和推理支持提供前提条件,同时也是构建领域知识库的重要基础,对于进一步针对药物的用机制、药代动力学及生理作用等方面的深度数据挖掘具有重要意义。由于领域本体自身结构的复杂性和领域本体之间的异构性,药物领域本体映射方法成为实现本体映射的难点之一。本研究提出药物领域本体映射与聚合模型模型,并以该模型为指导,对药物领域本体RxNorm与NDF-RT(美国国家药物文件—参考术语)进行映射实例研究,提出了RxNorm与NDF-RT两个领域本体之间映射及信息分类与聚合的一种新方法。研究结果证明该模型不仅具有可行性,也显示出其对多本体能够充分复用的实践价值;该模型也将在语义层面上进一步深化信息资源的知识组织方法,促进数字资源语义体系的构建。模型的不足之处在于,模型的使用是以现有本体为基础的,因此现有本体中的概念关系以及分类聚合信息的不足将将最终影响本体映射分类聚合的效果。另外,领域本体的其他特性也可能是改善知识组织方法的因素,因此,未来研究中应对领域本体进行更全面的调研,抽取有效的共有特征,促进模型的完善。 (2)基于RxNorm的AERS药名规范化初步研究 调查AERS药名被RxNorm的收录情况,是探索如何充分发挥RxNorm在AERS数据挖掘中作用的第一步,也是至关重要的一步。 本研究计算2004年到2010年AERS中全部药物名称与RxNorm精确匹配的比例,并与UMLS进行比较分析。结果显示了RxNorm和UMLS对AERS中唯一药物名称精确匹配的整体收录范围分别为13,565(4.8%)个与21,272(7.5%)个。2011AA版UMLS集成了160个源词汇表, UMLS对AERS的药名覆盖分别来自包括RxNorm在内的各种来源词表,其中RxNorm映射的数量排列第一。然后手工分析了频次大于1000的200个未被映射的高频AERS药物名称及分析388个随机选择的频次小于1000的低频药物名称,调查了某些药名未被映射的原因。尽管在AERS中,数据来源广泛且存在录入错误,但是高频词仍然能够显示出特定领域的词汇使用习惯。我们的研究将为RxNorm本体的完善提供依据。本章的研究也对下章研究中选择自然语言处理工具MedEx(以RxNorm为基础)提供了依据。 (3)构建数据挖掘知识整合库(AERS-DM) 在AERS药名规范化进行调查研究基础上,选择利用自然语言处理工具MedEx对AERS中药名进行规范化,并对其自然语言处理效果进行评价。在药物领域本体映射与聚合模型模型的基础上,使用贪婪算法将AERS中的药名聚合到RxNorm和NDF-RT中的药物分类信息。对于药物不良事件,通过映射方法将其映射到MedDRA中的PT和SOC代码进行聚合。最终建立开源的药物——不良事件数据挖掘知识整合库(AERS-DM)(网址:http://informatics.mayo.edu/adepedia/index.php/Download),最后通过实例研究,证实了AERS-DM数据集的挖掘效果。 AERS-DM中的信息集成了药物及不良事件知识库。AERS-DM具有规范化代码和聚合的功能,可以为AERS药物安全信号的挖掘以及相关数据挖掘领域提供更多的资源。该数据集包含两个表。一个表存储药物及不良事件的规范化信息,另一个表存储药物和不良事件的聚合信息。共有37,029,228对药物及不良事件记录。AERS中的药名被规范为14,490个RxNorm药名(由RxNorm代码表示),其中10,221个规范化的药名可以归到NDF-RT类别中,占71%。对于AERS-DM中的不良事件,共有14,740个MedDRA中的PT术语被聚合到MedDRA的SOC代码,占MedDRA中所有PT术语的76%。AERS-DM中,RxNorm代码表示的药名与MedDRA中的PT唯一对,即规范化后的药名与不良事件的唯一配对,共有4,639,613,将不良事件按组织器官聚合后,药物与不良事件组织器官的配对共205,725对。 (4)AERS-DM数据挖掘知识整合库的数据挖掘实证研究 AERS-DM是一个规范化和聚合的数据挖掘知识整合库,,优势在于药物数据的规范化,以及药物数据和不良事件数据的分类聚合,这些分类聚合知识全部来自于生物医学本体中所含有的知识结构。传统的利用AERS进行的不良事件检测研究大多仅针对少量药物,进行大规模数据挖掘的研究数量较少。在本研究中,我们利用常用抗癌药物成分信息对药物作用机制、生理作用、治疗意向的药物聚类与药物不良事件的聚类,以及年龄与性别的药物不良事件差别进行了大规模的系统分析,进一步证实了AERS-DM的语义挖掘潜力。 传统的不良事件检测依赖比例失衡测度,主要是量化出药物-不良事件关联的“始料未及”的程度,并试图克服自发报告系统中不良事件缺乏疾病发生率背景信息的缺点。在此研究中我们提出了一种新的不良事件检测方法,在这种方法中,通过将AERS数据与电子病历数据连接起来,从而获得不良事件的疾病发生率信息,并实现大规模药物不良事件之间的比较研究。本研究证实了AERS-DM作为AERS的一个高级版本,是一个可用于数据挖掘的丰富资源。 本文的创新点包括: (1)理论创新 提出药物领域本体映射与聚合模型。由于本体开发的局限性,当前领域本体各有特点,因此本研究提出的药物领域本体映射与聚合模型,充分利用不同本体的特点,通过本体映射,将某一本体的分类信息与其他本体的内容形成互补,实现某一领域多个本体的分类聚合功能,从而节约本体开发成本,实现本体充分复用。 (2)方法创新 (i)在药物领域本体映射与聚合模型的基础上,开发出一套系统的分类聚合算法,实现利用NDF-RT与RxNorm对AERS数据库中的药物进行分类聚合。方法创新体现在两方面:①利用RxNorm中的丰富关系来推理出可以映射到NDF-RT本体并能进一步进行药物分类的术语。②同时利用临床药物名和通用药物名来找到NDF-RT的多轴分类,以此避免单独使用通用药物名进行映射可能漏掉的分类。与现有的其他方法相比,此方法适用于更加复杂的情况。 (ii)利用自然语言处理工具与生物医学本体对AERS大规模数据进行规范化和信息聚合,使药物不良事件的大规模信号检测成为可能。在此基础上,实现了一种新的不良事件检测方法,通过将AERS数据与电子病历数据连接起来,获得不良事件的疾病发生率信息,实现大规模药物不良事件之间的比较研究。
[Abstract]:At present, occurrence of adverse drug events is becoming a serious public health problem. Although the drug before the listing after the study of adverse events with strict experiment, but still can not find all potential adverse events. In 1960s, thalidomide (thalidomide) after the events, many countries introduced the pharmacovigilance system (phamacovigilance) the monitoring of the listed drugs. The U.S. Food and Drug Administration (Food andDrug, Administration, FDA) drug adverse event reporting system (Adverse Event ReportingSystem AERS) database is mainly used to find those in clinical trials due to the low frequency without identified rare serious adverse events, or adverse drug events, namely safety signal. If the potential drug safety problems found in AERS, FDA will conduct epidemiological studies to further evaluate the adverse The event, to determine the causal relationship between the drug and adverse events. The safety evaluation of adverse drug events based on FDA may take a series of adjustment of laws and regulations to improve product safety and protect public health, such as updating the drug information, limiting the use of drugs, the new security related information to the public, or in a few cases, revocation the drug from the market.
At present, most research on adverse event data mining has focused on the small part of the data of adverse events to avoid the use of data mining, and Research on large-scale data; for adverse drug events drug mechanism, pharmacokinetics and physiological functions and other aspects of the depth of excavation, comparative study of data mining and the adverse events for a certain class of drugs, of drug mechanism, study the pharmacokinetics and physiological function of the depth of excavation, and the adverse events of a drug comparative data mining research, integration of AERS and other data sources to realize the difficult to achieve, and this kind of large-scale, deep the characteristics of mining in adverse events reveal different drug categories, are important causes of adverse drug events and genetic correlation, the drug is bad The important direction of event monitoring and clinical drug safety research. Lack of knowledge integration on adverse drug events related data resources seriously restricts the research.
Is the knowledge integration of adverse drug events and the effective use of information resources needs massive medical information resources, but also improve the drug adverse event data key problems must be solved to seriously study and the mining efficiency. In recent years the development of drug ontology is provided for research on the resource integration to achieve an opportunity, however due to the complexity of drug ontology, data the lack of standardization and domain ontology mapping technology problem solution data field of adverse drug events in the knowledge integration and the depth of polymerization has failed to obtain the ideal data mining, drug adverse events and therefore failed to extend to the use and analysis of large-scale data.
Domain ontology can provide relevant knowledge and reasoning decision support, to promote the depth of mining large scale drug safety signal detection and drug adverse events. This study used biomedical ontology AERS related information resources integration, integration of knowledge, information aggregation, interoperability with other medical data resources, enrich the adverse drug events data mining resources and promote the detection of drug safety signals.
The main contents of this study include:
(1) drug ontology mapping and aggregation model is proposed
Ontology mapping and classification of drug information and polymerization for drug related knowledge and decision reasoning support provided a prerequisite, but also an important foundation to build a domain knowledge base, to further for the drug mechanism, the depth data of pharmacokinetics and physiological function of mining has important significance. Due to the heterogeneity between the complexity of the field the structure of ontology and domain ontology, ontology mapping method of drugs has become one of difficulties in ontology mapping. This study proposes drug ontology mapping and polymerization model, and with the help of this model, the ontology RxNorm and NDF-RT drugs (national drug field file - terms of reference) of mapping examples, put forward the mapping between RxNorm and NDF-RT two domain ontology and information classification and aggregation of a new method. The results show that the model Is not only feasible, but also shows its multi ontology can fully reuse the practical value; knowledge organization method of the model will also be at the semantic level to further deepen the information resources, promote the construction of digital resources in the semantic system. The inadequacies of the model is that the model used is based on the existing ontology based, so the lack of concept the relationship between the existing ontology and classification information aggregation will ultimately impact the ontology mapping classification polymerization effect. In addition, other characteristics of domain ontology may also be factors, improve the knowledge organization methods, therefore, not to deal with the ontology research fields in research more comprehensive and effective extraction of characteristics, to improve the model.
(2) preliminary study of AERS RxNorm based on standardized medicine
The investigation of AERS drug names included RxNorm, the first step is to explore how to give full play to the role of RxNorm in AERS data mining, but also a crucial step.
The research on the calculation of 2004 to 2010 all the drug name AERS and RxNorm exact matching ratio, and compare with UMLS. The results show that the overall RxNorm and UMLS on AERS was the only drug name matching coverage were 13565 (4.8%) and 21272 (7.5%).2011AA version of UMLS integrates 160 sources vocabulary, vocabulary of the various sources of UMLS AERS the name of the drug coverage from including RxNorm, in which the number of RxNorm mapping. Then analyzes the arrangement of first hand frequency is greater than 1000 of the 200 drugs is not high frequency AERS mapping name and analysis of 388 randomly selected low drug frequency less than 1000 names, some investigation the name of the drug have not been mapped reasons. Although in AERS, a wide range of data sources and input errors, but high frequency word still can show the domain specific vocabulary usage. Our research will be To provide the basis for improving the RxNorm ontology. This chapter also on the Natural Language Processing MedEx tool to select the next chapter in the study (based on RxNorm) to provide the basis.
(3) construct data mining knowledge integration database (AERS-DM)
In the name of the drug AERS standardized on the basis of investigation, selection of standardization of the Chinese medicine AERS using Natural Language Processing MedEx tools, and to evaluate its effect. Natural Language Processing in the field of pharmaceuticals based ontology mapping and aggregation model, using the greedy algorithm in the AERS will be the name of the drug are aggregated into drug RxNorm and classification NDF-RT information for adverse drug events, it is mapped to MedDRA PT and SOC code were aggregated by mapping method. Finally establish the open source drug adverse event data mining knowledge integration database (AERS-DM) (address: http://informatics.mayo.edu/adepedia/index.php/Download), finally through the case study, confirmed the AERS-DM data sets mining effect.
The information in the AERS-DM integrated drug adverse events and knowledge base.AERS-DM has standardized code and aggregation function, can provide more resources for mining AERS drug safety signal and related data mining. The data set contains two tables. A table to store the information specification of drugs and adverse events, another a table to store the drug adverse events and information aggregation. A total of 37029228 of drug adverse events and record in the.AERS name of the drug are normalized to 14490 RxNorm the name of the drug (voiced by RxNorm code), of which 10221 standard drug names can be categorized into NDF-RT categories, accounting for 71%. for adverse events in AERS-DM. A total of 14740 PT term MedDRA was polymerized to MedDRA SOC code, all PT terms accounted for MedDRA of the 76%.AERS-DM, MedDRA and RxNorm said the name of the drug code in PT only on that, after the drug name and specification The only pair of adverse events, a total of 4639613, the adverse events according to the tissues and organs after polymerization, and drug adverse event organs matching a total of 205725.
(4) AERS-DM data mining based on knowledge integration of Library Data Mining
AERS-DM is a standardized and aggregated data mining knowledge integration advantage of drug library, data standardization, data classification and drug adverse events and data aggregation, the classification of knowledge from the knowledge structure of the polymer containing in biomedical ontology. The study of adverse events in the traditional AERS based detection only for small the drug, large-scale data mining studies. In this study, we use the commonly used anticancer drug ingredients information on physiological mechanism of drug action, intention to treat the drug clustering and clustering of adverse drug events and adverse drug events, age and gender differences in the analysis of large-scale systems, further confirmed the AERS-DM the semantic mining potential.
Traditional adverse event detection depends on the imbalance measure, mainly to quantify drug adverse events associated with the "unexpected", and try to overcome the adverse events of spontaneous reporting system in the lack of disease incidence background information shortcomings. In this study we propose a detection method of new adverse events, in this way, by connecting the AERS data and the electronic medical record data, the incidence of adverse events so as to obtain the information of disease, and to achieve a comparative study between large-scale drug adverse events. This study confirmed that AERS-DM as an advanced version of AERS, is a rich resource for data mining.
The innovations of this paper include:
(1) the theory of innovation
The drugs in the field of ontology mapping and aggregation model. Due to the limitations of the current development of ontology, domain ontology with different characteristics, so the drug ontology mapping proposed by this study and aggregation model, make full use of different characteristics of ontology, the ontology mapping, an ontology classification information and other ontology content complementary, classification a field of multiple ontology aggregation function, thus saving the ontology development cost, ontology reuse.
(2) the method of innovation
(I) in the field of medicine based ontology mapping and aggregation model, developed a set of classification system of drug aggregation algorithm in the AERS database were classified by NDF-RT and RxNorm. The polymerization method innovation is reflected in two aspects: first, to push out can be mapped to NDF-RT ontology and further classification of drugs the use of rich term relationships in RxNorm. At the same time the use of clinical drugs and generic drugs to find NDF-RT multi axis classification, in order to avoid the use of generic drugs alone classification mapping might have missed. Compared with other existing methods, this method is applicable to more complicated situations.
(II) AERS on large-scale data standardization and information aggregation using Natural Language Processing tools and biomedical ontology, the large-scale signal of adverse drug events detection possible. On this basis, the method for detecting new adverse events, the AERS data link up with the electronic medical record data, obtain the incidence of adverse event information the disease, a comparative study between large-scale adverse drug events.

【学位授予单位】:吉林大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:R95

【参考文献】

相关期刊论文 前5条

1 吕刚;郑诚;胡春玲;;基于概念分类的多本体映射方法研究[J];计算机应用研究;2011年09期

2 王效岳;胡泽文;白如江;;WordNet与SUMO本体之间的映射机制研究[J];现代图书情报技术;2011年01期

3 孔雅慧;;全球药物不良反应监测系统概述[J];药物流行病学杂志;2011年07期

4 尚鹏辉;詹思延;;数据挖掘在药品不良反应信号检出和分析中的应用(下)——药物流行病学研究新方法系列讲座(三)[J];中国药物应用与监测;2009年03期

5 钱轶峰;罗宝章;叶小飞;孙亚林;吴美京;王海南;杜文民;贺佳;;检测联合用药不良反应信号的数据挖掘方法[J];中国卫生统计;2010年01期

相关博士学位论文 前1条

1 叶小飞;基于自发呈报系统与循证医学的药品不良反应信号挖掘[D];第二军医大学;2011年



本文编号:1529981

资料下载
论文发表

本文链接:https://www.wllwen.com/yixuelunwen/yiyaoxuelunwen/1529981.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户3fccb***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com