当前位置:主页 > 科技论文 > 软件论文 >

基于Aho-Corasick自动机算法的概率模型中文分词CPACA算法

发布时间:2018-08-28 15:49
【摘要】:Aho-Corasick自动机算法是著名的多模式串匹配算法,它在模式串失配时,通过fail指针转移至有效的后续状态,存在一个或多个有效的后续状态可能。据此特性,该文提出了一种适应于中文分词的自动机算法。该算法使用动态规划的方法,计算上下文匹配概率,转移至最佳的有效后续状态,即实现了基于字符串匹配的机械分词方法与基于统计概率模型的方法结合。实验结果表明,该算法分词准确率高。
[Abstract]:The Aho-Corasick automaton algorithm is a famous multi-pattern string matching algorithm. When the pattern string mismatches, it can be transferred to an effective subsequent state by fail pointer, and there are one or more effective follow-up states. In this paper, an automaton algorithm for Chinese word segmentation is proposed. The algorithm uses dynamic programming method to calculate the context matching probability and transfer to the best effective follow-up state, that is, the combination of mechanical word segmentation method based on string matching and statistical probability model is realized. Experimental results show that the algorithm has high accuracy.
【作者单位】: 女王大学工程与应用科学学院;
【分类号】:TP391.1


本文编号:2209871

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2209871.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户6adf2***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com