无重叠约束近似模式匹配
发布时间:2018-07-31 07:15
【摘要】:模式匹配(也称为串匹配)是计算机科学中基本问题之一,已经广泛地应用到了生物研究、音乐信息检索和序列模式挖掘等各个领域。在模式匹配中,有的仅仅考虑最后一个模式子串在序列中匹配的位置,这种模式匹配称为宽松模式匹配;更为挑战性的问题是对每个模式串均考虑其在序列串中匹配的位置,而这种模式匹配称为严格模式匹配。严格模式匹配中近似模式匹配允许模式串与所匹配的序列串之间存在一定差异,增加了匹配的灵活性。本文研究的是具有Hamming距离的近似模式匹配。无重叠约束是要求任何两个出现中不能在相同位置处使用序列中相同字符。具有无重叠约束的模式匹配问题是指给定模式P在序列S中的所有出现集合中找到最大子集C,使得集合C中任意两个出现均是无重叠的出现。目前研究的模式匹配多为精确匹配,精确匹配要求每个字符必须匹配,一定程度上限定了匹配的灵活性。为此本文提出了无重叠约束的近似模式匹配问题(Approximate Pattern matching with Non-Overlapping constraints,APNO)。我们运用网树结构解决该问题,提出了NETLOR和NETROL算法,并对其改进提出了NETLORE和NETROLE算法。该算法在求解时,先将模式匹配问题转换为一棵网树,然后在网树中每次查找树根叶子路径,并将该路径所对应的解作为一个无重叠近似出现,之后依据树根叶子路径在网树的局部范围内进行快速剪枝,为下一次查找树根叶子路径做准备。最后大量实验验证了本文提出的四个算法的正确性,并且验证了NETLORE是更为高效的算法。
[Abstract]:Pattern matching (also called string matching) is one of the basic problems in computer science, which has been widely used in biological research, music information retrieval and sequential pattern mining and other fields. In pattern matching, some only consider the position of the last pattern string matching in the sequence, this pattern matching is called loose pattern matching; the more challenging problem is that each pattern string considers its matching position in the sequence string. This pattern matching is called strict pattern matching. In strict pattern matching, approximate pattern matching allows some difference between pattern string and matching sequence string, which increases the flexibility of matching. In this paper, approximate pattern matching with Hamming distance is studied. No overlap constraint requires any two occurrences not to use the same characters in the sequence at the same location. The problem of pattern matching with no overlap constraint is that given pattern P finds the largest subset C in all appearance sets of sequence S so that any two occurrences in set C are non-overlapping. At present, the pattern matching is mostly precise matching, which requires each character to match, which limits the flexibility of matching to a certain extent. In this paper, an approximate pattern matching problem without overlapping constraints, (Approximate Pattern matching with Non-Overlapping constraints, is proposed. We use the net tree structure to solve this problem, propose NETLOR and NETROL algorithms, and propose NETLORE and NETROLE algorithms to improve them. In the algorithm, the pattern matching problem is first transformed into a net tree, and then the root leaf path of the tree is searched in the network tree each time, and the corresponding solution of this path appears as a non-overlapping approximation. Then pruning is carried out in the local area of the web tree according to the tree root leaf path to prepare for the next search of the tree root leaf path. Finally, a large number of experiments verify the correctness of the proposed four algorithms, and verify that NETLORE is a more efficient algorithm.
【学位授予单位】:河北工业大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.1
本文编号:2154835
[Abstract]:Pattern matching (also called string matching) is one of the basic problems in computer science, which has been widely used in biological research, music information retrieval and sequential pattern mining and other fields. In pattern matching, some only consider the position of the last pattern string matching in the sequence, this pattern matching is called loose pattern matching; the more challenging problem is that each pattern string considers its matching position in the sequence string. This pattern matching is called strict pattern matching. In strict pattern matching, approximate pattern matching allows some difference between pattern string and matching sequence string, which increases the flexibility of matching. In this paper, approximate pattern matching with Hamming distance is studied. No overlap constraint requires any two occurrences not to use the same characters in the sequence at the same location. The problem of pattern matching with no overlap constraint is that given pattern P finds the largest subset C in all appearance sets of sequence S so that any two occurrences in set C are non-overlapping. At present, the pattern matching is mostly precise matching, which requires each character to match, which limits the flexibility of matching to a certain extent. In this paper, an approximate pattern matching problem without overlapping constraints, (Approximate Pattern matching with Non-Overlapping constraints, is proposed. We use the net tree structure to solve this problem, propose NETLOR and NETROL algorithms, and propose NETLORE and NETROLE algorithms to improve them. In the algorithm, the pattern matching problem is first transformed into a net tree, and then the root leaf path of the tree is searched in the network tree each time, and the corresponding solution of this path appears as a non-overlapping approximation. Then pruning is carried out in the local area of the web tree according to the tree root leaf path to prepare for the next search of the tree root leaf path. Finally, a large number of experiments verify the correctness of the proposed four algorithms, and verify that NETLORE is a more efficient algorithm.
【学位授予单位】:河北工业大学
【学位级别】:硕士
【学位授予年份】:2015
【分类号】:TP391.1
【参考文献】
相关期刊论文 前7条
1 柴欣;贾晓菲;武优西;江贺;吴信东;;一般间隙及一次性条件的严格模式匹配[J];软件学报;2015年05期
2 武优西;刘亚伟;郭磊;吴信东;;子网树求解一般间隙和长度约束严格模式匹配[J];软件学报;2013年05期
3 黄国林;郭丹;胡学钢;;基于通配符和长度约束的近似模式匹配算法[J];计算机应用;2013年03期
4 李艳;孙乐;朱怀忠;武优西;;网树求解有向无环图中具有长度约束的简单路径和最长路径问题[J];计算机学报;2012年10期
5 黄国林;郭丹;胡学钢;;求解近似模式匹配的启发式算法[J];计算机科学与探索;2013年01期
6 武优西;吴信东;江贺;闵帆;;一种求解MPMGOOC问题的启发式算法[J];计算机学报;2011年08期
7 陆丽娜,魏恒义,杨怡玲,管旭东;Web日志挖掘中的序列模式识别[J];小型微型计算机系统;2000年05期
,本文编号:2154835
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2154835.html