四倍体单体型重建算法研究与软件开发
发布时间:2018-12-08 10:41
【摘要】:基因多态性源于单核苷酸多态性(Single Nucleotide Polymorphisms,SNP),对SNP进行分析研究在生物遗传学领域具有重要意义。而由SNP位点序列组成的单体型,比单个SNP携带更多遗传信息。单体型的分析和检测对于了解基因功能,诊断复杂疾病及精确定位物种遗传基因等具有重大作用。遗憾的是,当前利用生物学手段直接测定单体型的花销太过昂贵。所以,使用计算机技术确定并重建出单体型,具有重大现实意义。过去此研究主要围绕二倍体进行,随着研究水平进一步发展和适应社会发展的需求,有更多倍体的重建问题展开研究。本文主要围绕四倍体单体型重建问题研究,提出了基于MEC/GI模型(the Minimum Error Correction with Genotype Information,带有基因信息最少错误更正模型)的EHTS算法和EHTD算法。EHTS算法计算单体型每个位点所有排列情况的支持度,并选取支持度值最大的排列作为该位点的SNP值。此过程反复迭代,直至确定所有位点取值确定出该单体型。通过算法对比实验,EHTS算法在各种参数条件下性能良好,运行速度较快,且与W-GA,Q-PSO算法相比有更高的重建率。EHTD算法主要通过计算各个位点的差异度,选取差异度值最小的排列情况重建出单体型。实验表明,该算法比W-GA,Q-PSO算法有更好的重建效果。少数情况下,EHTD比EHTS算法重建率高。本文在EHTD和EHTS算法实验基础上,设计了一个针对四倍体单体型重建的应用软件。该软件使用C#语言开发,软件功能主要分为输入模块,算法重建模块和输出模块。软件输入模块中主要以读文件方式输入数据;软件的运行模块主要实现单体型重建,该模块是整个软件核心部分,集成了 EHTD和EHTS算法,可以高效重建出四倍体单体型;输出模块中,该软件重建的四条单体型显示在输出窗口,同时将输出结果写入文件方便数据留存。本软件参照通用片段数据规则,设计片段数据模块,可推广下性较好。综上所述,本文对四倍体单体型重建问题进行研究,提出了有效的重建方法,并设计了相关应用软件。这些研究工作具有一定科研价值和应用价值,为进一步深入展开四倍体物种研究奠定基础。
[Abstract]:Gene polymorphism originates from single nucleotide polymorphism (Single Nucleotide Polymorphisms,SNP). The analysis of SNP is of great significance in the field of biogenetics. Haplotypes composed of SNP locus carry more genetic information than single SNP. Haplotype analysis and detection play an important role in understanding gene function, diagnosing complex diseases and accurately locating genetic genes of species. Unfortunately, the current cost of using biological methods to measure haplotypes directly is too expensive. Therefore, the use of computer technology to determine and reconstruct haplotypes, has great practical significance. In the past, this research mainly focused on diploid. With the further development of the research level and the need of social development, there are more polyploid reconstruction problems. This paper focuses on the study of tetraploid haplotype reconstruction and proposes a (the Minimum Error Correction with Genotype Information, model based on MEC/GI model. EHTS algorithm and EHTD algorithm with minimum error correction model of gene information. EHTS algorithm calculates the support degree of each locus of haplotype and selects the arrangement with the largest support value as the SNP value of the locus. This process iterates over and over until all loci are determined to determine the haplotype. Compared with W-GAQ-PSO algorithm, EHTS algorithm has better performance, faster running speed and higher reconstruction rate than W-GAQ-PSO algorithm. EHTD algorithm mainly calculates the difference degree of each locus. The haplotype was reconstructed by selecting the arrangement with the lowest difference value. Experiments show that this algorithm has better reconstruction effect than W-GAQ-PSO algorithm. In a few cases, the reconstruction rate of EHTD is higher than that of EHTS. Based on the experiments of EHTD and EHTS algorithms, an application software for tetraploid haplotype reconstruction is designed in this paper. The software is developed in C # language. The software functions are divided into input module, algorithm reconstruction module and output module. In the software input module, the data is mainly input by the way of reading files, the running module of the software mainly realizes the haplotype reconstruction, this module is the core part of the whole software, which integrates EHTD and EHTS algorithms, and can efficiently reconstruct tetraploid haplotype. In the output module, the four haplotypes reconstructed by the software are displayed in the output window, and the output results are written to the file for easy data retention. According to the general rules of segment data, this software designs fragment data module, which can be popularized. To sum up, this paper studies the problem of tetraploid haplotype reconstruction, puts forward an effective reconstruction method, and designs related application software. These studies have certain scientific research value and application value, and lay a foundation for further research on tetraploid species.
【学位授予单位】:广西师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q811.4;TP311.52
[Abstract]:Gene polymorphism originates from single nucleotide polymorphism (Single Nucleotide Polymorphisms,SNP). The analysis of SNP is of great significance in the field of biogenetics. Haplotypes composed of SNP locus carry more genetic information than single SNP. Haplotype analysis and detection play an important role in understanding gene function, diagnosing complex diseases and accurately locating genetic genes of species. Unfortunately, the current cost of using biological methods to measure haplotypes directly is too expensive. Therefore, the use of computer technology to determine and reconstruct haplotypes, has great practical significance. In the past, this research mainly focused on diploid. With the further development of the research level and the need of social development, there are more polyploid reconstruction problems. This paper focuses on the study of tetraploid haplotype reconstruction and proposes a (the Minimum Error Correction with Genotype Information, model based on MEC/GI model. EHTS algorithm and EHTD algorithm with minimum error correction model of gene information. EHTS algorithm calculates the support degree of each locus of haplotype and selects the arrangement with the largest support value as the SNP value of the locus. This process iterates over and over until all loci are determined to determine the haplotype. Compared with W-GAQ-PSO algorithm, EHTS algorithm has better performance, faster running speed and higher reconstruction rate than W-GAQ-PSO algorithm. EHTD algorithm mainly calculates the difference degree of each locus. The haplotype was reconstructed by selecting the arrangement with the lowest difference value. Experiments show that this algorithm has better reconstruction effect than W-GAQ-PSO algorithm. In a few cases, the reconstruction rate of EHTD is higher than that of EHTS. Based on the experiments of EHTD and EHTS algorithms, an application software for tetraploid haplotype reconstruction is designed in this paper. The software is developed in C # language. The software functions are divided into input module, algorithm reconstruction module and output module. In the software input module, the data is mainly input by the way of reading files, the running module of the software mainly realizes the haplotype reconstruction, this module is the core part of the whole software, which integrates EHTD and EHTS algorithms, and can efficiently reconstruct tetraploid haplotype. In the output module, the four haplotypes reconstructed by the software are displayed in the output window, and the output results are written to the file for easy data retention. According to the general rules of segment data, this software designs fragment data module, which can be popularized. To sum up, this paper studies the problem of tetraploid haplotype reconstruction, puts forward an effective reconstruction method, and designs related application software. These studies have certain scientific research value and application value, and lay a foundation for further research on tetraploid species.
【学位授予单位】:广西师范大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:Q811.4;TP311.52
【参考文献】
相关期刊论文 前4条
1 张倩;吴t熇,
本文编号:2368212
本文链接:https://www.wllwen.com/kejilunwen/ruanjiangongchenglunwen/2368212.html