基于文本挖掘的疾病辅助导诊技术研究
发布时间:2018-05-04 11:53
本文选题:JM双层模型 + 医疗主诉文本 ; 参考:《北京邮电大学》2015年硕士论文
【摘要】:在医疗数据中挖掘有价值信息是一个亟待解决的问题。随着医疗领域文本数据的累积,其中蕴含价值不可估量,对知识发现工具需求日益强烈。此外,就患者而言,有远程自助导诊的需求。为解决以上问题,本文将Bayes分类技术应用于医疗导诊中,并针对医疗文本特征提出优化模型。 本文提出基于Jelinek-Mercer (JM)的双层Bayes分类模型。医疗文本具有类别区分不明显、缺乏大规模训练集、某些低频词具有高判别性、多层级等特征,JM双层模型利用JM平滑技术解决训练集不完备的问题,并针对文本多层级特征,构建双层模型,提高分类精度。 基于JM双层模型的理论分析,本文完成实验平台搭建,用JAVA语言实现了伯努利和多项式两种Bayes分类器,并分别完成双层模型构建,最后在双层模型基础上引入JM平滑模块。 实验结果表明,双层伯努利和多项式模型的分类准确率分别为0.894和0.903,较单层模型分别提高7.84%和6.61%。在双层模型基础上引入JM平滑模块后,精度有进一步提升,伯努利和多项式准确率分别为0.905和0.912。实验中精度最高的是基于JM的双层多项式Bayes模型。 最后,将分类精度达0.912的分类器应用于导诊系统中,完成导诊应用系统的设计和实现。患者可以通过远程输入主诉症状,进行自助导诊。利用JM双层多项式Bayes分类器进行疾病类别判定,完成改进的贝叶斯分类模型在医疗导诊领域的应用研究。
[Abstract]:Mining valuable information in medical data is an urgent problem. With the accumulation of text data in the medical field, there is an inestimable value and an increasing demand for knowledge discovery tools. In addition, for patients, there is a need for remote self-help guidance. In order to solve the problem, this paper applies Bayes classification technology to medical treatment. In the guidance, we propose an optimization model for medical texts.
In this paper, a two-layer Bayes classification model based on Jelinek-Mercer (JM) is proposed. The medical text has no obvious classification, a lack of large-scale training set, some low-frequency words have high discriminability, multilayer and so on. The JM double layer model uses JM smoothing technology to solve the problem of incomplete training set, and constructs a double layer model for the multi-layer features of the text. To improve the classification accuracy.
Based on the theoretical analysis of the JM double layer model, this paper completes the experiment platform and implements the two Bayes classifiers of Bernoulli and polynomial in JAVA language, and constructs the double layer model respectively. Finally, the JM smooth module is introduced on the basis of the double layer model.
The experimental results show that the classification accuracy of the double Bernoulli and the polynomial model is 0.894 and 0.903 respectively. The accuracy is further improved after the JM smoothing module is introduced on the basis of the double layer model by 7.84% and 6.61%. respectively. The accuracy of the Bernoulli and polynomial accuracy is the highest in the JM experiment. Bilevel polynomial Bayes model.
Finally, the classifier with the classification accuracy of 0.912 is applied to the guidance system to complete the design and implementation of the guidance application system. The patient can carry on the self-guidance through the remote input of the main complaint symptoms. The JM double decker polynomial Bayes classifier is used to determine the disease category, and the improved Bias classification model in the medical guidance field should be completed. Use research.
【学位授予单位】:北京邮电大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:R44;TP391.1
【参考文献】
相关期刊论文 前10条
1 罗森林;刘盈盈;冯扬;韩磊;陈功;王倩;;BFS-CTC汉语句义结构标注语料库构建方法[J];北京理工大学学报;2012年03期
2 孙健,王伟,钟义信;基于K-最近距离的自动文本分类的研究[J];北京邮电大学学报;2001年01期
3 胡佳妮,徐蔚然,郭军,邓伟洪;中文文本分类中的特征选择算法研究[J];光通信研究;2005年03期
4 李雪蕾,张冬茉;一种基于向量空间模型的文本分类方法[J];计算机工程;2003年17期
5 胡碧松;冯丹;曹务春;方立群;龚建华;;基于贝叶斯算法的移动式疾病智能诊断系统[J];计算机应用;2008年S1期
6 代雪莲;陈正伟;;基于粗糙集和遗传算法的医疗诊断方法研究[J];计算机与数字工程;2009年10期
7 林关成;;基于最大匹配算法的中文分词模型改进[J];科技信息(学术研究);2008年36期
8 毛伟;徐蔚然;郭军;;基于n-gram语言模型和链状朴素贝叶斯分类器的中文文本分类系统[J];中文信息学报;2006年03期
9 郑霖;徐德华;;基于改进TFIDF算法的文本分类研究[J];计算机与现代化;2014年09期
10 林士敏,王双成,陆玉昌;Bayesian方法的计算学习机制和问题求解[J];清华大学学报(自然科学版);2000年09期
,本文编号:1842951
本文链接:https://www.wllwen.com/huliyixuelunwen/1842951.html
最近更新
教材专著