系统发育树的和极大似然估计
发布时间:2018-01-08 21:33
本文关键词:系统发育树的和极大似然估计 出处:《山东大学》2017年硕士论文 论文类型:学位论文
更多相关文章: 系统发育树 和极大似然估计 生物信息学 序列分析
【摘要】:系统发育分析(phylogenetic analysis)是生物信息学里的一个重要主题。随着分子数据的不断积累,人们更加关注分子数据蕴含的信息。一般用核苷酸序列或者蛋白质序列构建系统发育树。常用的方法有最大似然法,最大简约法和距离矩阵法。最大似然法和最大简约法是直接根据序列得到系统发育树,距离矩阵法是根据序列之间的距离得到系统发育树,有最少改变的两个序列被当作邻居。这些方法都是要估计树的拓扑结构和分枝长度,我们希望所有的方法都能得到同样的树,但是往往做不到。其中基于概率论的极大似然估计方法更加准确。但是,极大似然估计法要对每一个可能的拓扑结构计算概率。与此同时,随着位点的增多,待分析的拓扑结构数目庞大。重复这样的计算过程,计算量惊人。并且已被证明这是一个NP-hard问题。大多数情况,人们不能获得全局最优的系统发育树估计。但是可以通过启发搜索法得到一个较好的估计。其中,B.B.zhou等人用并行算法实现上述搜索过程,提高了搜索速度和所搜范围。本文探索系统发育树分枝长度的估计问题,以和极大似然作为判别准则,并使用粒子群算法来优化分枝长度。在Billera等人建立的系统发育树的几何空间基础上,即每一个拓扑结构可以作为一个象限。假定位点在核苷酸替换过程中是一个马尔可夫过程。在这样的假设下,我们计算所有位点的似然函数之和,并估计出分枝长度。系统发育树对于其他生物信息的研究具有重要意义。为人们探索物种起源以及分子进化,进而探索基因功能提供依据。系统发育分析对于病毒的控制,疾病的诊断具有重要指导意义。所以,探索系统发育树估计方法是一项十分有意义的工作。
[Abstract]:Phylogenetic analysis (phylogenetic analysis) is an important topic in bioinformatics. With the continuous accumulation of molecular data, people pay more attention to the information contained in general. The molecular data with nucleotide sequences or protein sequences. Phylogenetic tree construction methods commonly used maximum likelihood method, maximum parsimony and maximum likelihood distance matrix method. Method and the maximum parsimony method is based on the direct sequence phylogenetic tree, distance matrix method is based on the distance between sequences by phylogenetic tree, two sequences are at least change as a neighbor. The method is to estimate the topology and branch length of the tree, we hope that all the methods can get the same the tree, but to do so. The maximum likelihood probability estimation method based on more accurate. However, the maximum likelihood estimation method to every possible topology To calculate the probability. At the same time, with the number of sites increased, the topological structure of the large. Repeat this calculation process and calculation amount is amazing. And it has been proved that NP-hard is a problem. In most cases, people cannot obtain the global optimal phylogenetic tree estimation. But you can get a good estimation through inspiration search method. Among them, B.B.zhou et al use parallel algorithm to realize the search process, improves the search speed and search scope. This paper explores the estimation problem of phylogenetic tree branch length, and the maximum likelihood as the criterion, and the use of particle swarm algorithm to optimize branch length. The geometric space based phylogenetic tree based on Billera. On that every topological structure can be used as a quadrant. False location is a Markov process in the nucleotide substitution process. Under these assumptions, I Are calculated for all sites and the likelihood function, and estimate the branch length. The phylogenetic tree is very important for the research of other biological information. For exploring the origin of species and molecular evolution, and to explore the function of gene. The phylogenetic analysis provides the basis for the control of the virus, which has important guiding significance to the diagnosis of the disease. So, exploration estimation of phylogenetic tree method is a very meaningful work.
【学位授予单位】:山东大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q811.4;O212.1
【参考文献】
相关期刊论文 前2条
1 陈兆斌;;拽线法:一个构建系统发育树的新算法(英文)[J];生物信息学;2013年04期
2 彭军还;和极大似然估计──一种新的估计准则[J];桂林冶金地质学院学报;1994年04期
,本文编号:1398783
本文链接:https://www.wllwen.com/shoufeilunwen/benkebiyelunwen/1398783.html