基于数字信号处理理论和方法的外显子预测研究
发布时间:2018-01-10 13:21
本文关键词:基于数字信号处理理论和方法的外显子预测研究 出处:《南开大学》2014年博士论文 论文类型:学位论文
更多相关文章: 生物信息学 外显子预测 数字信号处理 奇异点检测 经验模态分解
【摘要】:快速、可靠而准确地预测真核生物DNA序列中的外显子位置,是生物信息学领域的一个重要问题。其中,准确预测短长度外显子的位置是准确预测外显子位置和数目的难点之一。在外显子预测中,有效地抑制由内含子区所产生的背景噪声对提高短长度外显子的预测准确度具有重要的作用。在真核生物的基因中,少量负责编码蛋白质的外显子被非编码的内含子间隔成许多不连续的片断,而且较大比例的外显子长度都很小。准确地预测短长度外显子受制于其所缺乏的明显特征,这使得预测工作十分困难。而且一些短长度外显子所包含的编码信息在肿瘤侵袭和转移等各个环节发挥重要作用。论文提出了两种外显子预测方法,从捕捉短长度外显子的特征和抑制内含子区背景噪声这两方面对短长度外显子的预测准确度进行了提升。 目前针对外显子的预测方法,根据原理和特点的不同,主要可以归纳为基于数字信号处理和基于数据库的两大类外显子预测方法。论文利用小波变换模极大值的奇异点检测算法和经验模态分解,发展了两种外显子预测方法。论文整体工作概述如下: (1)基于小波变换模极大值奇异点检测算法的外显子预测方法。该方法首先构造了核苷酸分布序列,通过跟踪核苷酸分布序列的小波变换模极大值点沿尺度的传播特性,对外显子信号与内含子所产生的噪声进行有效地分离,并对短长度外显子所产生的信号突变点保持较高的重构精度,进而实现了对短长度外显子的准确探测。数据集HMR195和BG570是两个用于评估外显子预测方法性能优劣的通用数据集,论文利用这两个数据集对奇异点检测方法在预测短长度外显子中的性能以及整体预测性能进行评估。与现有主要的预测方法相比,奇异点检测方法对数据集HMR195和BG570中外显子预测所得到的预测结果主要体现在如下三个方面:1)在对长度小于等于50碱基对以及长度小于等于200碱基对的短长度外显子预测中,该方法对短长度外显子所得到的探测率分别至少有12%和8%的提高;2)在对整体外显子的预测中,该方法对外显子预测的所得到的准确率至少有6.8%的提高;3)在抑制内含子区背景噪声方面,该方法所得到的信噪比至少有74.5%的提高。 (2)为扩大奇异点检测方法的应用范围,论文从数据库NCBI GenBank中随机选取了200组测试数据,其中每组测试数据包含一个短长度内含子以及被这个短长度内含子所分隔的两个相邻短长度外显子。在对这200组测试数据中外显子的预测中,与现有主要的预测方法相比,奇异点检测方法所得到的预测准确率至少有20.7%的提高。 (3)基于经验模态分解和修改Gabor小波变换的外显子预测方法。该方法采用了基于DNA抗弯刚度的数值映射机制,利用经验模态分解将DNA数值序列分解为若干本征模态函数。然后,通过修改Gabor小波变换计算第一个本征模态分量的局部功率谱。鉴于经验模态分解是一种自适应的非平稳信号处理工具,因此该方法可以对传统方法无法观察到的短长度外显子特征进行探测。此外,由于只计算了第一个本征模态分量的局部功率谱,因此该方法在噪声抑制方面具有一定的优势。与现有主要的预测方法相比,该方法在对数据集HMR195中外显子的预测中,其预测结果主要体现在如下两个方面:1)该方法对外显子预测所得到的信噪比至少有20.8%的提高;2)在对长度小于等于50碱基对的短长度外显子预测中,该方法所得到的探测率至少有5.3%的提高。图60幅,表14个,参考文献120篇。
[Abstract]:Fast, reliable and accurate prediction of eukaryotic DNA sequences in the exon position, is an important problem in the field of bioinformatics. The accurate prediction of short length exon position is one of the difficulties in accurately predicting the exon number and position. In exon prediction, prediction. To reduce the background noise generated by the intron of short length exon accuracy plays an important role in eukaryotic gene, a protein responsible for encoding exons are non encoding intron interval into many discontinuous segments, and a large proportion of the exon length are very small. To accurately predict the short length of exons due to their lack of obvious characteristics, which makes the prediction very difficult. But some of the short length of exons encoding information contained in various aspects of tumor invasion and metastasis play an important role In this paper, two exons prediction methods are proposed, which improve the prediction accuracy of short exons from two aspects: capturing the characteristics of short exons and suppressing the background noise of introns.
The needle exon prediction method, according to the principle and characteristics of the different, can be summarized as the main based on digital signal processing and exon two categories based on database prediction method. Modal singularity using wavelet transform modulus maxima detection algorithm and empirical decomposition, developed two exon prediction method. The whole thesis work is summarized as follows:
(1) based on wavelet transform modulus maxima and singular point detection algorithm of the exon prediction method. This method firstly constructed nucleotide distribution sequence, the wavelet transform modulus maxima distribution of nucleotide sequence tracking propagation point along the scale, to effectively separate the noise signal of exon and intron generated, and the short length of explicit signal subspace generated by point mutations remain reconstruction with high accuracy, and realizes accurate detection of short length of exons HMR195 and BG570. The data set is used to evaluate the two exon prediction method of the performance of the general data set, this paper use two data sets to evaluate singular point detection method of exon performance and overall prediction performance in the prediction of short length. Compared with the existing prediction methods mainly, singular point detection method on data sets HMR195 and BG570 exon predicted The prediction results are mainly embodied in three aspects as follows: 1) in length is less than or equal to 50 BP exon and the prediction of the length is less than or equal to 200 base pairs in length of short, this method respectively at least 12% and 8% increase of short length explicit detector obtained; 2) in prediction the whole exon, the prediction accuracy of the method of exons obtained by at least 6.8% of the increase; 3) in the inhibition of intron of background noise, the method obtained the signal-to-noise ratio of at least 74.5% of the increase.
(2) to expand the scope of application of the singular point detection method, the paper from the database NCBI GenBank randomly selected 200 groups of test data, in which each test data contains a short length of introns and are separated by the short length of intron two adjacent short length exons. In the prediction of these 200 groups of tests the data of exons, compared with the existing prediction methods mainly, prediction of singularity detection methods get accurate rate of at least 20.7% of the increase.
(3) exon prediction method of empirical mode decomposition and wavelet transform based on the modified Gabor. The method uses numerical mapping mechanism of DNA bending stiffness based on the decomposition of DNA numerical sequence is decomposed into several intrinsic mode functions using empirical mode. Then, by modifying the Gabor wavelet transform to calculate the first intrinsic mode component the local power spectrum. In view of the EMD is non-stationary signal processing tools, an adaptive short length, so the method can the traditional method cannot be observed in exon feature detection. In addition, due to the intrinsic mode component of a local power spectrum calculated only, so the method in noise suppression have a certain advantage. Compared with the existing prediction methods mainly, this method in the prediction of the HMR195 data sets of exons, the predicted results are mainly embodied in two aspects as follows: 1) the party The signal to noise ratio (SNR) predicted by external prediction is increased by at least 20.8%. 2) in the prediction of short exons with short length and 50 base pairs, the detection rate obtained by this method is at least 5.3%. 60 graphs, 14 tables and 120 references.
【学位授予单位】:南开大学
【学位级别】:博士
【学位授予年份】:2014
【分类号】:Q811.4;TN911.7
【参考文献】
相关期刊论文 前6条
1 马宝山;朱义胜;;一种用于基因预测的FIR数字滤波器[J];电子学报;2007年09期
2 ;A Brief Review of Computational Gene Prediction Methods[J];Genomics Proteomics & Bioinformatics;2004年04期
3 Sitanshu Sekhar Sahu;Ganapati Panda;;Identification of Protein-Coding Regions in DNA Sequences Using A Time-Frequency Filtering Approach[J];Genomics, Proteomics & Bioinformatics;2011年Z1期
4 杜竹青;;一种提高外显子预测的改进周期3消噪策略[J];江苏科技大学学报(自然科学版);2013年06期
5 马玉韬;车进;关欣;滕建辅;;加窗窄通带滤波器蛋白质编码区预测算法[J];数据采集与处理;2013年02期
6 马玉韬;轩秀巍;车进;滕建辅;;基于全相位滤波理论的基因预测[J];上海交通大学学报;2013年07期
,本文编号:1405423
本文链接:https://www.wllwen.com/kejilunwen/wltx/1405423.html