基于最大概率法探讨中医症状信息提取与标准化
发布时间:2018-07-07 13:50
本文选题:症状 + 文本挖掘 ; 参考:《中华中医药杂志》2017年05期
【摘要】:目的:通过比较两个基于最大概率法的症状提取方案,探讨中医症状信息的提取和标准化。方法:数据分析和处理在R 3.3.2上进行。运用《诊断学》《中医诊断学》及1 000份已标记的肺炎住院病历建立症状标准化数据库,症状描述词库和关键词-形容词词库。基于最大概率法分别设计出中文分词方案,直接提取方案和组合提取方案。并用这3种方案对2 311份肺炎病历进行症状信息提取和标准化,从产生维度、手工处理情况、症状提取效果对方案进行比较。结果:直接提取方案和组合提取方案均能有效降低维度,组合提取方案手工处理百分比较小和症状提取效果较好。结论:基于最大概率法的组合提取方案能有效提取中医症状信息。
[Abstract]:Objective: to study the extraction and standardization of TCM symptom information by comparing two symptom extraction schemes based on maximum probability method. Methods: data analysis and processing were performed on R 3.3.2. Using Diagnostics of traditional Chinese Medicine (TCM Diagnostics) and 1000 marked inpatient medical records of pneumonia to establish a standardized database of symptoms, symptom description lexicon and keyword adjective lexicon. Chinese word segmentation scheme, direct extraction scheme and combination extraction scheme are designed based on maximum probability method. The symptom information of 2 311 cases of pneumonia was extracted and standardized by these three schemes. The results were compared from generation dimension, manual treatment and the effect of symptom extraction. Results: both the direct extraction scheme and the combined extraction scheme could effectively reduce the dimensionality, the percentage of manual processing and the symptom extraction effect of the combined extraction scheme were smaller and better. Conclusion: the combination extraction scheme based on maximum probability method can extract TCM symptom information effectively.
【作者单位】: 广州中医药大学;
【基金】:教育部博士点基金项目(No.20114425110009)~~
【分类号】:R241
,
本文编号:2105158
本文链接:https://www.wllwen.com/zhongyixuelunwen/2105158.html
最近更新
教材专著