基于条件随机场的农作物病虫害及农药命名实体识别
发布时间:2018-05-12 13:01
本文选题:病虫害 + 农药 ; 参考:《农业机械学报》2017年S1期
【摘要】:互联网农技问答平台现仅依靠人工提供答题服务,响应速度慢,回答质量难以保证。实现智能农技问题解答,构建农技知识库,需要从现有问答数据提取"农作物-病虫害-农药"命名实体三元组。现有对农业中文命名实体识别的研究较少,且准确率较低。根据农作物、病虫害及农药命名实体的特点,针对农技问答数据,提出基于条件随机场的农作物、病虫害及农药命名实体的识别方法。对数据集进行格式整理及自动分词,并对分词后的语料,针对是否包含特定界定词、是否含特定偏旁部首、是否是数量词、是否是特定左右指界词及词性等特征进行自动标注。利用标注后的数据训练CRF模型,可以对语料进行分类,包括判断语料是否属于农作物、病虫害、农药3类命名实体并识别该语料在复合命名实体中的位置,从而实现了对3类命名实体的识别,由此可自动构建关联三元组。通过试验选择特征组合和调整上下文窗口大小,提高了本方法的识别准确度,降低了模型训练时间,对农作物、病虫害、农药命名实体识别的准确度分别达97.72%、87.63%、98.05%,比现有方法有显著提高。
[Abstract]:Internet agricultural technology question and answer platform only depends on manual to provide answer service, the response speed is slow, the answer quality is difficult to guarantee. To realize the intelligent agricultural technology problem solving and to construct the agricultural technology knowledge base, it is necessary to extract the named entity triple of "crop, pest and pesticide" from the existing question and answer data. There are few researches on agricultural Chinese named entity recognition, and the accuracy is low. According to the characteristics of named entities of crops, pests and pesticides, a method of identifying named entities of crops, pests and pesticides based on conditional random field is proposed. The data set is organized by format and automatic participle, and the corpus after word segmentation is automatically tagged for whether it contains a specific defining word, whether it contains a specific partial radical, whether it is a quantitative word, whether it is a specific left and right finger boundary word and whether it is a part of speech and so on. Using the labeled data to train the CRF model, we can classify the corpus, including judging whether the corpus belongs to the named entities of crops, pests and diseases, pesticides and recognizing the position of the corpus in the compound named entity. The recognition of named entities of three classes is realized, and the associated triples can be constructed automatically. By selecting feature combination and adjusting the size of context window, the recognition accuracy of this method is improved, the training time of model is reduced, and the crops, pests and diseases are treated. The accuracy of identification of named entities of pesticides was 97.72 and 98.05 respectively, which was significantly higher than that of the existing methods.
【作者单位】: 中国农业大学信息与电气工程学院 山东老刀网络科技有限公司
【基金】:国家自然科学基金项目(61502500) 北京市自然科学基金项目(4164090) 中央高校基本科研业务费专项资金项目(2017QC077)
【分类号】:TP391.1
【相似文献】
相关期刊论文 前10条
1 向晓雯,史晓东,曾华琳;一个统计与规则相结合的中文命名实体识别系统[J];计算机应用;2005年10期
2 张晓艳;王挺;陈火旺;;命名实体识别研究[J];计算机科学;2005年04期
3 邱莎;;几种基于机器学习的生物命名实体识别模型比较[J];电脑知识与技术(学术交流);2007年05期
4 赵军;;命名实体识别、排歧和跨语言关联[J];中文信息学报;2009年02期
5 郑强;刘齐军;王正华;朱云平;;生物医学命名实体识别的研究与进展[J];计算机应用研究;2010年03期
6 张向U,
本文编号:1878716
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1878716.html