基于评论挖掘的药物副作用发现
发布时间:2018-03-21 09:07
本文选题:药物副作用发现 切入点:文本挖掘 出处:《大连理工大学》2014年硕士论文 论文类型:学位论文
【摘要】:随着药物副作用带来的危害越来越大,药物安全问题日益受到人们的重视并逐渐成为医学界和民众关注的热点,因此如何发现药物的副作用具有重大的理论与实用价值。而Web2.0技术的发展使得互联网上出现了不少医疗健康类社交网站,人们在上面分享用药经历并对药物进行评论。这些网站上的用户评论数据日益丰富,其中蕴含的药物副作用相关信息开始受到研究人员的关注,并逐渐形成从用户评论中挖掘副作用信息这样一种快捷、有效的药物副作用发现机制。 在从用户评论中挖掘药物副作用时,由于人们可能采用不同的表述方式来描述副作用,而新药的上市与用药者的差异性会造成新的副作用出现,因此从评论中识别新的副作用名称并进行标准化十分重要。针对该问题,本文第3章工作利用条件随机场模型识别评论中的副作用,对识别出的副作用名称进行标准化,最后获取药物的副作用。实验结果显示,条件随机场模型可以识别出已知的与新的副作用名称,而标准化技术将副作用名称进行聚合与归并,有利于药物副作用的发现。本文通过将挖掘出的药物已知的副作用与数据库记录进行对比验证了本文方法的有效性,同时得到一个按评论中的发生频率排序的药物潜在副作用列表。 从用户评论中识别副作用名称是药物副作用发现中基础却关键的步骤,但由于评论内容在语法上的不规范性与副作用名称的多样性,从评论中识别副作用实体具有较大的挑战性。针对该问题,本文第4章实现了一个融合不同方法的副作用实体识别系统。第一种方法将滑动窗口中的短语与词典中的名称进行词袋匹配识别副作用实体,并在匹配时考虑了编辑距离;第二种方法利用条件随机场模型进行识别,其中应用了向前选择法找出最佳的特征集合,并通过试验找出效果最好的词语上下文特征组合方式。将两种方法的识别结果进行融合,得到的融合后结果比单一方法具有较大提升,说明通过融合可以弥补单一方法识别的不足。与其他文献中的副作用实体识别方法相比,本文方法的识别性能与之相当甚至可能优于他们,从而证明本文提出的融合方法的有效性。
[Abstract]:With more and more harm caused by side effects of drugs, people pay more and more attention to the problem of drug safety and gradually become a hot spot in the medical field and the public. So how to find the side effects of drugs has great theoretical and practical value. And the development of Web2.0 technology has made many medical and health social networking sites appear on the Internet. People share their experiences with drugs and comment on them. There's a growing body of user reviews on these sites, and information about the side effects of drugs is starting to get the attention of researchers. A quick and effective mechanism of drug side effect discovery is gradually formed by mining side effect information from user comments. In mining side effects from user reviews, because people may use different expressions to describe side effects, the differences between new drug listings and drug users can lead to new side effects. Therefore, it is very important to identify and standardize the new side effect names from the comments. In order to solve this problem, in the third chapter, we use conditional random field model to identify the side effects in the comments, and standardize the identified side effects names. Finally, the side effects of the drug were obtained. The experimental results showed that the conditional random field model could identify the known and new side effects names, while the standardized technology aggregated and merged the side effects names. In this paper, the effectiveness of the method is verified by comparing the known side effects of the extracted drugs with database records. Also get a list of potential side effects by frequency of occurrence in the comments. Identifying side effects names from user reviews is a fundamental but critical step in the discovery of side effects, but due to the grammatical irregularity of comments and the diversity of side effects names, Identifying side effects from comments can be challenging. In chapter 4, we implement a side effect entity recognition system which combines different methods. The first method matches the phrase in the sliding window with the name in the dictionary to identify the side-effect entity, and considers the editing distance when matching. The second method is based on the conditional random field model, in which the forward selection method is used to find the best feature set. And through the experiment to find out the best way of word context feature combination. The results of the two methods are fused, and the result of fusion is much better than that of the single method. It shows that the fusion method can make up for the shortcomings of single method recognition. Compared with other side effect entity recognition methods in other literatures, the recognition performance of this method is comparable or even better than that of them. Thus, the validity of the fusion method proposed in this paper is proved.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2014
【分类号】:TP311.13;R96
【相似文献】
相关期刊论文 前10条
1 周小甲;李昊e,
本文编号:1643177
本文链接:https://www.wllwen.com/yixuelunwen/yiyaoxuelunwen/1643177.html
最近更新
教材专著