基于特征和隐马尔可夫模型的文本信息抽取

发布时间：2018-01-19 05:14

本文关键词： 文本分块特征提取隐马尔可夫模型　出处：《河南科技大学学报(自然科学版)》2008年02期 　论文类型：期刊论文

【摘要】：基于文本分块提出一种新的文本信息抽取技术,该技术利用文本的语义特征和结构特征,抽取具有特征的状态,以此结果为基础,进一步运用改进的隐马尔可夫模型,抽取剩余的无特征状态。对美国CMU大学CORA搜索引擎研制组提供的数据集中的100篇进行测试,结果显示精确度和召回率比基于单词和传统隐马尔可夫模型的方法都有所提高,并进一步提高了效率。
[Abstract]:This paper proposes a new text information extraction technique based on text partitioning, which utilizes the semantic and structural features of the text to extract the characteristic states, and based on the results. Using the improved hidden Markov model to extract the remaining non-feature state, we tested 100 pieces of data set provided by the CORA search engine development team of CMU University in the United States. The results show that the accuracy and recall rate are higher than those based on word and traditional hidden Markov models, and the efficiency is further improved.
【作者单位】：河南交通职业技术学院河南交通职业技术学院河南交通职业技术学院吉林大学计算机科学与技术学院
【基金】：吉林省科技发展计划项目(20050527)
【分类号】：TP391.1
【正文快照】： 0前言目前的电子资源含有大量的有用信息,但是欠结构化,不能为传统的数据库型查询系统所利用。针对这一问题,出现了信息抽取技术。信息抽取(Information Extraction)是指从文本中自动抽取相关的或特定类型的信息。信息抽取包括规则法、统计法以及规则和统计相结合的方法等。

【共引文献】