基于拉马努金傅里叶变换的基因相似性分析
发布时间:2018-01-22 08:25
本文关键词: DNA序列 非序列比对 相似性分析 系统发育树 离散傅里叶变换 拉马努金傅里叶变换 出处:《山东大学》2017年硕士论文 论文类型:学位论文
【摘要】:随着科学技术的迅猛发展,科研人员得到的生物基因及蛋白质数据日益增长,生物信息学研究的重点从获取并积累数据逐步转变到如何分析并解释这些数据。海量的生物数据中蕴含着极其丰硕的生物信息,从这些生物数据中获取尽可能多的信息是一项十分有意义的工作,越来越多的生物学、医学及药学工作者已经认识到生物信息学的实用性和重要性,同时很多数学家和计算机科学家等许多跨领域的研究者也被吸引到生物信息学这个新兴的交叉学科中。生物序列的相似性分析是其中最基本、最重要的内容之一,比如,分子进化问题和基因识别问题等研究工作都是建立在生物序列相似性分析的基础之上。序列比对(Sequence Alignment)方法是生物序列相似性分析的传统方法,但由于其存在需要大量用户定义参数、序列很长时计算量较大等问题,序列非比对(Alignment-free)方法作为其改进及补充而被提出并迅速发展成为相似性分析的研究热点之一。本文以基因序列为研究对象,在Voss映射和拉马努金傅里叶变换(Ramanujan Fourier Transform,RFT)的基础上,改进原有方法并得到了新的非比对方法,构建了系统发育树并进行了相似性分析。本文展示了基于拉马努金傅里叶变换功率谱的基因序列非比对方法来进行序列相似性分析。首先使用Voss映射将DNA序列转换成四条指标序列,并在其上应用改进的拉马努金傅里叶变换算法,得到一组RFT系数,通过计算RFT系数间的欧几里得距离(Euclidean metric)来进行相似性分析,利用非加权分组平均法(Unweighted Pair Group Method with Arithmetic Means,UPGMA)构建系统发育树。为了计算不同长度序列间RFT系数的欧几里得距离,本文采用将较短的指标序列后补零至与数据集内最长序列长度相等的方法,在保持信息量的前提下,我们将对数据集中的基因序列在相同维度的空间中进行分析。得到的结果将与时下流行的离散傅里叶变换方法及多序列比对(Multiple Sequence Alignment,MSA)方法的结果进行比较,可以发现,新方法有着较好的聚类结果并节约了计算量。
[Abstract]:With the rapid development of science and technology, biological gene and protein data researchers obtained increasing bioinformatics research focus from the acquisition and accumulation of data changing to how to analyze and interpret the data. Massive biological data contained in biological information is extremely rich, is a very meaningful work to get as much as possible more information from these biological data in biology, more and more, the practicality and importance of medical and pharmacy workers has been recognized by bioinformatics, and many research mathematicians and computer scientists and many other interdisciplinary researchers are attracted to bioinformatics in this emerging discipline. The similarity analysis of biological sequences is one of the most basic, one of the most important contents such as molecular evolution and gene identification problems research work are based on biological sequence. Like on the basis of the analysis. Sequence alignment (Sequence Alignment) method is similar to the traditional method of biological sequence analysis, but due to the existence of a large number of users need to define the parameters, the calculation problem of large amount of long sequences, sequence alignment (Alignment-free) method for its improvement and supplement has been proposed and become one of the research hotspots of similar analysis. Based on the gene sequence as the research object, in the Voss map and the La Manu Kim Fu Liye transform (Ramanujan Fourier Transform, RFT) on the basis of the improvement of the original method and get a new non alignment, phylogenetic tree construction and similarity analysis. This paper shows the non alignment method. The gene sequence La Manu Kim Fu Liye transform based on power spectrum of sequence similarity analysis. The DNA sequence into four index sequence using Voss mapping, and in The application of improved Ramanujan Fu Liye transform algorithm to get a set of RFT coefficients, by calculating the RFT coefficient between the Euclidean distance (Euclidean Metric) to similarity analysis using non weighted average method in groups (Unweighted Pair Group Method with Arithmetic Means, UPGMA) to construct phylogenetic tree. In order to calculate the length of the sequence between different RFT coefficients the Euclidean distance, the index will be shorter sequence after the repairing method and data set to zero the longest sequence of equal length, while keeping the amount of information, we will focus on the analysis of the gene sequence data in the same dimensions. The results obtained with the discrete Fu Liye transform method popular and multiple sequence alignment (Multiple Sequence, Alignment, MSA) method to compare the results, it can be found that the new method has a better clustering result and save The amount of calculation.
【学位授予单位】:山东大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q811.4;TN911.7
【相似文献】
相关期刊论文 前4条
1 刘超;;知无涯者——20世纪印度传奇数学天才拉马努金[J];中小学数学(高中版);2011年Z2期
2 吴三水;;神一般的数学天才 拉马努金[J];大科技(百科新说);2012年07期
3 张新立,王青建;拉马努金──印度的天才数学家[J];自然辩证法通讯;2000年04期
4 ;庆祝万隆会议十周年[J];世界知识;1965年09期
相关重要报纸文章 前5条
1 卞毓麟;拉马努金之谜[N];文汇报;2002年
2 葛之;拉马努金与哈代:一段佳话传千古[N];中华读书报;2004年
3 张倩侠;科学家的传奇人生[N];中国教育报;2003年
4 ;他们正在读什么[N];中国图书商报;2003年
5 印度驻华使馆;印度为全球科技人才提供奖学金[N];中国贸易报;2005年
相关博士学位论文 前1条
1 彭芳芳;拉马努金与查波顿多项式上的组合学[D];南开大学;2014年
相关硕士学位论文 前2条
1 单云龙;拉马努金傅里叶变换的理论研究[D];天津理工大学;2016年
2 王健;基于拉马努金傅里叶变换的基因相似性分析[D];山东大学;2017年
,本文编号:1454207
本文链接:https://www.wllwen.com/shoufeilunwen/benkebiyelunwen/1454207.html