当前位置:主页 > 管理论文 > 移动网络论文 >

融合规则与统计的微博新词发现方法

发布时间:2018-03-19 01:30

  本文选题:微博新词 切入点:构词规则 出处:《计算机应用》2017年04期  论文类型:期刊论文


【摘要】:结合微博新词的构词规则自由度大和极其复杂的特点,针对传统的C/NC-value方法抽取的结果新词边界的识别准确率不高,以及低频微博新词无法正确识别的问题,提出了一种融合人工启发式规则、C/NC-value改进算法和条件随机场(CRF)模型的微博新词抽取方法。一方面,人工启发式规则是指对微博新词的分类和归纳总结,并从微博新词构词的词性(POS)、字符类别和表意符号等角度设计的微博新词的构词规则;另一方面,改进的C/NC-value方法通过引入词频、邻接熵和互信息等统计量来重构NC-value目标函数,并使用CRF模型训练和识别新词,最终达到提高新词边界识别准确率和低频新词识别精度的目的。实验结果显示,与传统方法相比,所提出的方法能有效地提高微博新词识别的F值。
[Abstract]:According to the characteristics of Weibo's great freedom and complexity of word formation rules, aiming at the problem that the recognition accuracy of the boundary of new words extracted by the traditional C / NC-value method is not high, and the problem that the low frequency Weibo new words cannot be correctly recognized, This paper presents a new word extraction method for Weibo, which combines the improved C / NC-value algorithm of artificial heuristic rule and conditional random field CRF model. On the one hand, artificial heuristic rule refers to the classification and summarization of Weibo new words. On the other hand, the improved C / NC-value method reconstructs the NC-value objective function by introducing the statistics of word frequency, contiguous entropy and mutual information, etc. The CRF model is used to train and recognize new words, which can improve the accuracy of boundary recognition and the accuracy of low frequency new words recognition. The experimental results show that, compared with the traditional methods, The proposed method can effectively improve the F value of Weibo's new word recognition.
【作者单位】: 北京交通大学计算机与信息技术学院;
【基金】:国家自然科学基金资助项目(61370130,61473294) 中央高校基本科研业务费专项资金资助项目(2014RC040) 科学技术部国际科技合作计划项目(K11F100010)~~
【分类号】:TP391.1;TP393.092


本文编号:1632243

资料下载
论文发表

本文链接:https://www.wllwen.com/guanlilunwen/ydhl/1632243.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e7981***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com