符号序列多阶Markov分类
发布时间:2018-08-24 18:51
【摘要】:针对基于固定阶Markov链模型的方法不能充分利用不同阶次子序列结构特征的问题,提出一种基于多阶Markov模型的符号序列贝叶斯分类新方法。首先,建立了基于多阶次Markov模型的条件概率分布模型;其次,提出一种附后缀表的n-阶子序列后缀树结构和高效的树构造算法,该算法能够在扫描一遍序列集过程中建立多阶条件概率模型;最后,提出符号序列的贝叶斯分类器,其训练算法基于最大似然法学习不同阶次模型的权重,分类算法使用各阶次的加权条件概率进行贝叶斯分类预测。在三个应用领域实际序列集上进行了系列实验,结果表明:新分类器对模型阶数变化不敏感;与使用固定阶模型的支持向量机等现有方法相比,所提方法在基因序列与语音序列上可以取得40%以上的分类精度提升,且可输出符号序列Markov模型最优阶数参考值。
[Abstract]:Aiming at the problem that the method based on fixed order Markov chain model can not make full use of the structural characteristics of different order subsequences, a new method of symbol sequence Bayesian classification based on multi-order Markov model is proposed. Firstly, the conditional probability distribution model based on multi-order Markov model is established. Secondly, a nth-order sub-sequence suffix tree structure with suffix table and an efficient tree construction algorithm are proposed. The algorithm can establish a multi-order conditional probability model in the process of scanning a sequence set. Finally, a Bayesian classifier for symbol sequences is proposed. Its training algorithm is based on the maximum likelihood method to learn the weights of different order models. The classification algorithm uses the weighted conditional probability of each order to predict Bayesian classification. A series of experiments are carried out on the actual sequence sets of three application fields. The results show that the new classifier is not sensitive to the change of model order, and compared with the existing methods such as support vector machine with fixed order model. The proposed method can improve the classification accuracy by more than 40% in gene sequence and speech sequence, and can output the optimal order reference value of symbol sequence Markov model.
【作者单位】: 福建农林大学金山学院;福建师范大学数学与计算机科学学院;
【基金】:国家自然科学基金资助项目(61672157)~~
【分类号】:O211.62
本文编号:2201717
[Abstract]:Aiming at the problem that the method based on fixed order Markov chain model can not make full use of the structural characteristics of different order subsequences, a new method of symbol sequence Bayesian classification based on multi-order Markov model is proposed. Firstly, the conditional probability distribution model based on multi-order Markov model is established. Secondly, a nth-order sub-sequence suffix tree structure with suffix table and an efficient tree construction algorithm are proposed. The algorithm can establish a multi-order conditional probability model in the process of scanning a sequence set. Finally, a Bayesian classifier for symbol sequences is proposed. Its training algorithm is based on the maximum likelihood method to learn the weights of different order models. The classification algorithm uses the weighted conditional probability of each order to predict Bayesian classification. A series of experiments are carried out on the actual sequence sets of three application fields. The results show that the new classifier is not sensitive to the change of model order, and compared with the existing methods such as support vector machine with fixed order model. The proposed method can improve the classification accuracy by more than 40% in gene sequence and speech sequence, and can output the optimal order reference value of symbol sequence Markov model.
【作者单位】: 福建农林大学金山学院;福建师范大学数学与计算机科学学院;
【基金】:国家自然科学基金资助项目(61672157)~~
【分类号】:O211.62
【相似文献】
相关期刊论文 前3条
1 刘文波,于盛林;混沌在测量中的应用(英文)[J];Transactions of Nanjing University of Aeronautics & Astronau;2002年02期
2 卞春华;马千里;司峻峰;吴旭辉;宁新宝;王东进;;短时心率变异符号序列的熵分析方法[J];科学通报;2009年03期
3 ;[J];;年期
,本文编号:2201717
本文链接:https://www.wllwen.com/kejilunwen/yysx/2201717.html