带有通配符和长度约束的模式匹配问题求解及其应用研究
[Abstract]:Pattern matching with wildcards is one of the most important research directions in pattern recognition. It has attracted wide attention in computational biology, information retrieval, network security and other fields. It matches any character in the alphabet by introducing a special character, wildcards, into pattern recognition, which brings more flexibility. For example, in DNA sequences, the promoter TATTA sequence often appears in the downstream middle of the CAATCT sequence with 30-50 wildcards, which are not simply duplicated. A pattern composed of subsequences can improve sequence specificity by marking "CAATCT [30-50] TATA...". A substring is called a bounded length gaps. At the same time, the stability of the sequence of matching substring patterns is guaranteed by introducing one-off constraints. The problem of pattern matching with variable length constraints of wildcards is studied. The main contents are summarized as follows: (1) Considering the complexity, accuracy and completeness of solving exact pattern matching problems with wildcards and length constraints, there is still a lack of needles in the existing research results. In this paper, a three-tuple solution model for exact pattern matching problem with wildcard and length constraints is constructed by using the Constraint Satisfaction Problem Framework (CSPs). The model formally describes the basic concepts of constraint conditions and solution space of the problem, and eight special cases of the problem are given. The basic properties of the problem are formulated in a unified way, including the completeness under special conditions and the location relationship between adjacent matching solutions in the text. At the same time, a FIN algorithm for exact matching of pattern strings with wildcards is proposed. The algorithm divides the exact matching problem of pattern strings with wildcards into several independent sub-problems and theoretically illustrates the structural equivalence of the solution before and after partitioning.The experimental results show that the FIN algorithm can not only obtain the number of matches, but also obtain the complete matching solution position. To solve the problem of approximate pattern matching with wildcards, a heuristic algorithm W-DPBI is proposed to solve the problem of low quality matching substrings and easy to be lost. The algorithm adopts the strategy of text inversion search and the optimization of process. Compared with similar DP and SAIL-APPROX algorithms, the results show that the algorithm is effective. The average growth rate of the solution obtained by the method is 21.9% and the maximum is 57%. The matching results have good advantages, which can obviously improve the quality and ability of solving approximate matching results under certain conditions, and have good flexibility and inspiration in application. (3) Combining pattern matching and related algorithms in computational biology applications and According to the similarity structure of drug gene and disease gene sequence, the strategy of approximate matching collaborative filtering algorithm combined with related algorithm is adopted to search the collected data information source. The emphasis is on calculating the relationship between drug and disease from the perspective of known disease information and gene information. Similarity is applied to drug relocation and modeling. Experimental results show that this method can significantly improve the drug-disease enrichment of potential therapeutic relationships. Compared with existing classification models and random sampling results, it can effectively reduce the predicted false positive rate, and its model parameters can be used as a reference for drug development trials.
【学位授予单位】:合肥工业大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:TP391.4
【参考文献】
相关期刊论文 前10条
1 黄海宁;张浩;汪海;;沙利度胺抗肿瘤机制及其作用靶点CRBN的研究进展[J];中国药理学通报;2015年06期
2 张浩;叶明全;;求解PMWOC问题的位并行算法[J];计算机应用研究;2015年10期
3 强继朋;谢飞;高隽;胡学钢;吴信东;;带任意长度通配符的模式匹配[J];自动化学报;2014年11期
4 项泰宁;郭丹;王海平;胡学钢;;带通配符的模式匹配问题及其解空间特征分析[J];计算机科学;2014年09期
5 吴信东;强继朋;谢飞;;Pattern Matching with Flexible Wildcards[J];Journal of Computer Science & Technology;2014年05期
6 王可鉴;石乐明;贺林;张永祥;杨仑;;中国药物研发的新机遇:基于医药大数据的系统性药物重定位[J];科学通报;2014年18期
7 沈璐;纪允;纪冬宝;李萍;;带可变长度通配符的模式匹配算法[J];计算机工程与应用;2015年15期
8 吴信东;谢飞;黄咏明;胡学钢;高隽;;带通配符和One-Off条件的序列模式挖掘[J];软件学报;2013年08期
9 王宝勋;刘秉权;孙承杰;王晓龙;孙林;;基于论坛话题段落划分的答案识别[J];自动化学报;2013年01期
10 张永祥;程肖蕊;周文霞;;药物重定位——网络药理学的重要应用领域[J];中国药理学与毒理学杂志;2012年06期
相关博士学位论文 前3条
1 刘应玲;带可变长度通配符的模式匹配算法研究[D];合肥工业大学;2014年
2 赵华;多模型下的近似字符串匹配算法研究[D];华中科技大学;2013年
3 孙德才;基于q-gram过滤的近似串匹配技术研究[D];湖南大学;2012年
,本文编号:2204145
本文链接:https://www.wllwen.com/shoufeilunwen/xxkjbs/2204145.html