基于图形表示的DNA序列聚类与可靠性分析改进
发布时间:2018-01-20 04:17
本文关键词: 图形表示 聚类分析 系统生成树 Bootstrap 出处:《浙江理工大学》2017年硕士论文 论文类型:学位论文
【摘要】:生物序列的图形表达由于具有较好的可视化描述、局部信息表达等特点,已经成为研究生物序列的一种重要手段。利用图形表达生物序列并结合聚类分析可以有效地研究序列间的进化联系。然而,如何构造更有效的图形表达,更准确地评估聚类可靠性仍然是一个问题。本文主要对图形表达以及聚类可靠性评估方法进行研究,具体内容如下:1论文构造了一条基于H曲线的简化DNA序列空间曲线。对于较长的DNA序列图形表示,该方法不会出现远离中心线的现象,也避免了重叠和交叉的问题,表示方便并且理解直观,方便于几何特征分析。2在简化空间曲线的基础上,论文利用曲线的几何特征(曲率和挠率估算)构造DNA序列的特征描述。通过序列间的改进距离测度方法计算构造距离矩阵,并基于构造的距离矩阵进行聚类分析和构建系统发生树以显示聚类结果。3 Bootstrap方法直接应用在生物学中有两个缺点。其一,它不顾生物渐进进化的事实,假设每个样本是等可能的;其二,它忽略了一条DNA序列中碱基的相关性,假设碱基之间是相互独立的。在Bootstrap的基础上,论文提出了一种评估DNA序列聚类可靠性的改进方法。该方法首先按照一定比例随机抽取原始DNA序列的部分碱基,然后对抽取的每个碱基利用遗传算法进行替换。论文使用改进方法对聚类构建的进化树进行可靠性评估。实验结果发现可靠性评估的准确率得到了提高,表明该方法可行、有效。论文使用提出的图形表示方法及改进测度方法构造距离矩阵,用改进可靠性评估方法对基于上述矩阵的聚类结果进行了评估,同时也对比了使用其他相关方法得到的结果。经过实验分析,本文提出的改进方法优于相比较的方法。最后论文对研究工作做了总结,并对需要更加深入解决和研究的工作进行了展望。
[Abstract]:The graphical representation of biological sequences is characterized by good visual description and local information representation. It has become an important means to study biological sequences. Using graphics to express biological sequences and cluster analysis can effectively study the evolutionary relationship between sequences. However, how to construct more effective graphical expression. It is still a problem to evaluate clustering reliability more accurately. The main contents are as follows: 1. A simplified DNA sequence space curve based on H curve is constructed. For the long DNA sequence graph representation, the method will not appear far from the center line. It also avoids the problems of overlap and crossover, is convenient to express and understand intuitively, and is convenient for geometric feature analysis .2 on the basis of simplifying the spatial curve. In this paper, the geometric characteristics of the curve (curvature and torsion estimation) are used to construct the characteristic description of DNA sequence, and the distance matrix is constructed by the improved distance measure method. Clustering analysis based on the constructed distance matrix and constructing phylogenetic tree to show the clustering result. 3. 3 Bootstrap method has two disadvantages in biology. It disregards the fact that biological evolution is gradual, assuming that each sample is equally possible; Second, it ignores the correlation of bases in a DNA sequence, assuming that the bases are independent of each other. On the basis of Bootstrap. In this paper, an improved method to evaluate the clustering reliability of DNA sequences is proposed. Firstly, some bases of the original DNA sequences are randomly selected according to a certain proportion. Then each base extracted is replaced by genetic algorithm. The improved method is used to evaluate the reliability of the evolutionary tree constructed by clustering. The experimental results show that the accuracy of reliability evaluation is improved. The proposed method is used to construct the distance matrix and the improved reliability evaluation method is used to evaluate the clustering results based on the above mentioned matrix. At the same time, the results obtained by using other related methods are compared. Through experimental analysis, the improved method proposed in this paper is better than the comparison method. Finally, the research work is summarized in this paper. The work that needs to be solved and studied in depth is prospected.
【学位授予单位】:浙江理工大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q811.4;TP311.13
【相似文献】
相关期刊论文 前10条
1 冯涛,康U嗹,
本文编号:1446812
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/1446812.html