当前位置:主页 > 科技论文 > 软件论文 >

基于打分矩阵的生物序列频繁模式挖掘

发布时间:2018-06-19 21:30

  本文选题:近似匹配 + 通配符 ; 参考:《模式识别与人工智能》2016年10期


【摘要】:从生物序列中发现有意义的频繁模式已经成为生物信息领域研究的重要任务.文中提出基于打分矩阵的生物序列频繁模式挖掘算法.首先构造近似匹配得分矩阵,用于处理带通配符间隔约束的模式匹配问题中插入、替换、删除操作.然后设计基于打分矩阵的近似模匹配方法获取模式在序列中的近似出现次数.最后采用数据驱动模式生成方法和Apriori-like剪枝策略避免产生过多不必要的候选模式.在蛋白质和DNA序列上的实验表明文中算法性能更优,可用于挖掘不同序列的共同频繁模式.
[Abstract]:The discovery of meaningful frequent patterns from biological sequences has become an important task in the field of biological information. An algorithm for frequent pattern mining of biological sequences based on scoring matrix is proposed in this paper. First, the approximate matching score matrix is constructed to deal with the insertion, replacement and deletion operations in pattern matching problems with wildcard spacing constraints. Then the approximate mode matching method based on the scoring matrix is designed to obtain the approximate occurrence times of the pattern in the sequence. Finally, data-driven pattern generation and Apriori-like pruning strategy are used to avoid unnecessary candidate patterns. Experiments on protein and DNA sequences show that the proposed algorithm has better performance and can be used to mine common frequent patterns of different sequences.
【作者单位】: 合肥工业大学计算机与信息学院;Department
【基金】:国家自然科学基金-海外及港澳学者合作研究基金项目(No.61229301) 国家自然科学基金青年基金项目(No.61305062)资助~~
【分类号】:TP311.13


本文编号:2041354

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2041354.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户c5014***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com