当前位置:主页 > 科技论文 > 软件论文 >

基于多特征知识的先秦典籍词性自动标注研究

发布时间:2018-05-04 01:19

  本文选题:词性标注 + 先秦古籍 ; 参考:《图书情报工作》2017年12期


【摘要】:[目的 /意义]先秦典籍在古代典籍中的地位极为重要。本文提出对先秦典籍进行词性自动标注的解决方法,以便更加准确地挖掘先秦典籍中的潜在知识。[方法 /过程]通过条件随机场模型,结合统计方法确定组合特征模板,并最终得到针对先秦典籍的词性自动标注算法模型。[结果 /结论]在先秦典籍自动分词的整个流程基础上,得到简单特征模板、组合特征模板下的词性自动标注模型,基于组合特征模板的词性标注模型调和平均值F达到94.79%,具有较强的推广和应用价值。在构建词性自动标注模型的过程中,通过融入字词结构、词语拼音和字词长度的特征知识,使得模型的精确率和召回率得到有效提升。
[Abstract]:The purpose / significance of the pre-Qin classics in ancient books is extremely important. This paper puts forward the solution of automatic marking of part of speech in pre-Qin books in order to excavate the potential knowledge of pre-Qin books more accurately. [method / process] through conditional random field model, combined with statistical method to determine the combination feature template, and finally to obtain the algorithm model of automatic tagging of part of speech for pre-Qin books. [results / conclusion] on the basis of the whole process of automatic word segmentation in pre-Qin classical books, a simple feature template and a part of speech automatic tagging model under combination feature template are obtained. The concordant average F of part of speech tagging model based on combined feature template is 94.79, which is worth popularizing and applying. In the process of constructing the automatic tagging model of part of speech, the precision and recall rate of the model can be improved effectively by incorporating the feature knowledge of word structure, word pinyin and word length.
【作者单位】: 南京农业大学信息科学技术学院;南京农业大学领域知识关联研究中心;
【基金】:国家社会科学基金重大项目“基于《汉学引得丛刊》的典籍知识库构建及人文计算研究”(项目编号:15ZDB127);国家社会科学基金青年项目“哈佛燕京学社汉学引得丛刊研究”(项目编号:12CTQ019)研究成果之一 南京农业大学人文社会科学基金项目(项目编号:SKPT2016001)
【分类号】:TP391.1


本文编号:1840951

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1840951.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户1615b***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com