基于2b-RAD技术的辅助基因组组装和标记分型研究

发布时间:2018-01-12 04:01

  本文关键词:基于2b-RAD技术的辅助基因组组装和标记分型研究 出处:《中国海洋大学》2015年博士论文 论文类型:学位论文


  更多相关文章: 2b-RAD技术 基因组组装 标记分型 全基因组选择 栉孔扇贝


【摘要】:非模式生物的遗传资源相对匮乏,在这些物种中开展基因组范围内的遗传学研究仍然非常困难。简化基因组技术可以视为非模式生物的遗传学研究的有利工具。该技术主要是通过降低基因组的复杂度来降低测序成本,被广泛的应用于遗传图谱构建、数量性状定位、群体遗传学分析、系统进化分析和辅助基因组组装研究中。1、“桥接法”辅助基因组组装策略本研究提出了一种“桥接法”基因组组装策略:首先将2b-RAD分型技术引入传统的Happy实验构建高密度的2b-RAD图谱,借助该图谱可以对已有的Contigs序列进行进一步的升级。为了实现这一想法,我们不仅优化了传统的Happy实验,同时还还提出了结合随机抽样技术的层次组装算法。模拟数据显示该组装算法能够将拟南芥全基因组的35,618个BsaXI标签组装成40个Contigs,校正后的N50大小为4.1Mb(克隆长度为40kb,样本量为100);将人类1号染色体95,139个BsaXI标签组装成16个Contigs,校正后的N50大小为14.4Mb(克隆长度为40kb,样本量为100)。实际数据分析显示层次组装算法可以将拟南芥基因组内34,753个BsaXI标签组装成554个群,校正后的N50大小为224kb。在连接Contig方面,原始N50大小为54.1kb的Contig通过该软件其N50可以提升到815kb,N50大小为183.4kb的Contig可以提升到1.03Mb,N50大小为552.7kb的Contig可以提升到3.7Mb,而且Contig之间连接的准确率在98.1%-98.5之间。该低成本的辅助基因组组装方案将在海洋生物复杂基因组组装项目应用中发挥重要作用。2、无参照基因组分型算法开发和应用当前简化基因组技术的标记分型软件存在的缺点是:1)仿照有参照基因组的分型方法,无法排除基因组中重复序列对de novo分型造成的干扰;2)忽略了对显性标记的分型。本研究提出了一种混合泊松(正态)分布模型对来自重复序列区域的序列进行概率识别,并将该模型加入到已有的标记分型软件中形成新的分型算法iML。通过拟南芥和水稻基因组模拟数据分析表明iML方法比传统的ML算法假阳性率低12%-23%。通过拟南芥2b-RAD数据和三刺鱼的RAD-seq数据的验证表明iML方法比ML分型算法假阳性率低7%-17%(测序读长为30bp)。此外本研究开发了RADtyping软件,其不但整合了iML共显性标记分型算法,同时给出了处理显性标记的统计公式。通过拟南芥拟测交F1群体模拟数据显示当亲本和子代的平均测序深度为20x时,两类型标记的分型准确率可达98%。通过实际的两套重复文库分型结果发现,共显性标记的分型一致性达96%。通过Sanger法验证显示共显性标记的分型准确率为96%,显性标记的准确率为97%,这充分说明了RADtyping在标记分型上具有较高的准确率。3、2b-RAD技术在全基因组选择中的应用评估全基因组选择技术实施的重要条件之一是要有大量的基因组范围内的遗传标记。2b-RAD技术虽然在分型成本上具有明显的优势,其提供的标记密度是否满足水生生物全基因组选择育种需求仍然是未知的。本研究根据虾夷扇贝的基因组特征(包括基因组大小、杂合率、BsaXI酶切位点分布等)模拟了虾夷扇贝的育种群体。考察了三种不同标记密度HD-SNPs(芯片密度),MD-SNPs(所有BsaXI酶切位点),LD-SNPs(带有选择性碱基的BsaXI位点)对全基因组选择育种值估计准确率的影响。分析表明在不同的遗传背景下MD-SNPs比HD-SNPs准确率略低(3%)。在遗传力在0.3~0.5左右时LD-SNPs在育种值准确率估计上和MD-SNPs相当,但是标记的分型成本仅为后者十分之一。随后利用来源于3个家系的349个虾夷扇贝育种群体,对壳高、壳长和壳宽三种性状进行全基因组选择评估。家系间的育种值估计准确率在0.15-0.3之间,家系内的育种估计准确率在0.23~0.36之间。上述分析表明2b-RAD技术是水生生物全基因组选择项目中标记分型的平台首先。
[Abstract]:Non biological genetic resources are relatively scarce, it is still very difficult to carry out genetic studies within the genome in these species. Genetic research tool simplified genome technology can be regarded as non model organisms. This technology is mainly to reduce the cost by reducing the complexity of genome sequencing, is widely used in the construction of genetic map. QTL, genetic analysis, phylogenetic analysis and auxiliary genome assembly study.1, "bridging" auxiliary genome assembly strategy in this study presents a "bridging method" for group assembly strategy: first will build the high-density 2b-RAD map Happy experimental 2b-RAD typing technique into traditional, further with the help of Contigs to upgrade the existing sequence of the map. In order to realize this idea, we not only optimize the traditional Happy test, at the same time Also presented based on random sampling technology level assembly algorithm. Simulation data show that the algorithm can be assembled 35618 BsaXI tag genome assembly into 40 Contigs, corrected N50 size 4.1Mb (clone length 40KB, 100 samples); human chromosome 1 95139 BsaXI tag assembly 16 Contigs, N50 after correction of size 14.4Mb (clone length is 40KB, the sample size was 100). The actual data analysis showed that the level of assembly algorithm can be 34753 BsaXI tags within the Arabidopsis genome assembly into 554 groups, after correction of N50 size 224kb. in connection with Contig, the original N50 size 54.1kb Contig through the software N50 can be upgraded to 815kb N50, the size of 183.4kb Contig can be upgraded to 1.03Mb N50, the size of 552.7kb Contig can be upgraded to 3.7Mb, and the Contig connection between the accuracy in 98.1%- 98.5. Assist the genome assembly scheme of the low cost will play an important role in the marine biological complex.2 genome assembly project application, unmarked reference genome typing algorithm development and application of the simplified genome technology classification software shortcomings: 1) is modeled by a genotyping method of reference genome, to exclude the interference of repetition the genomic sequence of de novo type caused; 2) ignores the type of dominant markers. This study presents a mixed Poisson probability distribution model (normal) identification of repeat sequences from this area, and this model is added to the existing tag type software through the rice genome Arabidopsis and simulation data analysis showed that the iML method is better than the traditional ML algorithm of low false positive rate of 12%-23%. by RAD-seq data and 2b-RAD data of three Arabidopsis stickleback iML. classification algorithm to form new The results show that iML method is better than the ML type algorithm of low false positive rate 7%-17% (sequencing read length is 30bp). In addition to the research and development of the RADtyping software, which not only the integration of the iML co dominant markers typing algorithm, and gives a statistical formula to deal with dominant markers. The Arabidopsis pseudo testcross population F1 simulation data show that when the average depth of sequencing of the parent and filial generation was 20x, two sets of two types of repeat type library marking accuracy rate up to 98%. through the actual classification results, classification consistency of codominant markers was 96%. by Sanger method show that the accuracy rate of CO dominant labeled type was 96%, the accuracy rate of dominant the mark is 97%, which fully shows the important conditions of genomic selection technology for the implementation of the application of.3,2b-RAD technology RADtyping has high accuracy in marker typing in genomic selection in the evaluation is to have a large number of base Because the group within the scope of genetic markers although the.2b-RAD technology has obvious advantages in the classification of cost, the density of markers meet aquatic organisms genome selection breeding needs is still unknown. In this study, according to the characteristics of the genome of Yesso Scallop in Shell (including genome size, heterozygous ratio, BsaXI restriction site distribution simulation of the breeding population) Yesso Scallop in Shell. The effects of three kinds of different marker density HD-SNPs (chip density), MD-SNPs (all BsaXI restriction sites), LD-SNPs (BsaXI locus with selective base) on genomic selection influence accuracy of estimated breeding value. Analysis showed that the MD-SNPs in different genetic background than HD-SNPs the accuracy rate is slightly lower (3%). The genetic force in 0.3 ~ 0.5 LD-SNPs in breeding value estimation and the accuracy rate of MD-SNPs, but the cost is only 1/10 of the markers. Then use sources In 3 families, 349 breeding populations of Yesso Scallop in Shell, shell height, whole genome selection assessment of shell length and width of three characters. The accuracy of estimates between 0.15-0.3 among families in the breeding value of family breeding were accurately estimated in 0.23 ~ 0.36. The above analysis shows that 2b-RAD technology is the type of aquatic organisms genome selection marker platform project first.

【学位授予单位】:中国海洋大学
【学位级别】:博士
【学位授予年份】:2015
【分类号】:Q78

【相似文献】

相关期刊论文 前2条

1 廖远泉,王后伟;耐甲氧西林金黄色葡萄球菌研究进展[J];热带医学杂志;2002年03期

2 ;[J];;年期

相关会议论文 前1条

1 陈建飞;王承宝;时洪艳;陈小金;张志榜;冯力;;猪流行性腹泻病毒的分子流行病学研究[A];中国畜牧兽医学会畜牧兽医生物技术学分会暨中国免疫学会兽医免疫分会第八次学术研讨会论文集[C];2010年

相关博士学位论文 前1条

1 窦锦壮;基于2b-RAD技术的辅助基因组组装和标记分型研究[D];中国海洋大学;2015年



本文编号:1412565

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/jckxbs/1412565.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户5fc8a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com