基因组序列k-mer频次分析及核小体结合模体的理论预测和验证
发布时间:2018-08-30 17:57
【摘要】:基因组序列k-mer出现的频次存在进化分离现象。基于这一现象,我们分析了酵母基因组核小体核心序列与核小体连接序列中k-mer (k≤8)使用频次的差异。分析了人类1号染色体基因间序列8-mer使用频次的三峰分布及在XY二核苷分类下的分布特征,给出了理论预测的核小体结合模体集合,并与核小体占据率实验结果进行了比较。具体内容如下。基于Brogaard等人在实验上给出的酵母基因组序列上单碱基精度的核小体定位标注,获得全部的核小体中心序列和核小体连接序列。分析了k-mer(k取4、5、6和8)在两类序列中相对使用频率(RF)的差异,发现当k≥6时,少数高频k-mer使用差异明显。引入两类序列k-mer相对使用频率比的对数(LRF)参数值,并按照该值增序的方式排列模体,结果显示模体长度越长两类序列的使用差异越明显,当k7以后差异分布逐渐稳定。按照核心序列8-mer相对使用频率增序的方式排列模体,发现在相对使用频率小于0.5的区域,两类序列的8-mer使用差异更加显著。分别计算了7个抽样点附近核心序列偏好的8-mer和连接序列偏好8-mer的G+C含量和二核苷含量。结果显示当8-mer相对频率逐步减小时,对应模体的G+C含量逐步增大,连接序列偏好GG和CC二核苷的使用,核心序列明显偏好CG和GC二核苷的使用。总之,除了少数极偏好的模体外,两类序列k-mer使用的差异多数出现在k-mer相对频率很低的模体上,这些模体具有较高的G+C含量。核小体结合模体集合的理论预测对于全面了解核小体的定位和染色质重塑以及DNA序列的结构和进化具有重要的意义。为了解释人类基因组序列8-mer相对模体数随频次的分布的三峰现象。将8-mer集合按照8-mer中包含CG二核苷的含量分类,发现三个8-mer子集(OCG,1CG和2CG)各自形成独立的单峰分布,而依照其它15类二核苷分类则没有此现象,总体8-mer的三个峰正是这三个CG 8-mer子集分布的叠加。分析了DNA序列中8-mer使用的这一独特的性质,结合对核小体结合序列的实验研究结论,我们提出了1CG模体集合就是核小体结合模体的理论猜想。为了验证我们的猜想,计算了1CG 8-mer集合中偏好和稀有的三核苷相对频率,分别构建了核小体特征参数Ktri(O)和Ktri(R),得到它们在1177个基因转录起始序列(TSS)上的分布,然后与实验给出的核小体占据率分布比较。线性拟合的统计结果表明,置信度大于95%的序列占到总数的89.2%,置信度大于99%的序列占到总数的81.6%。比较的结果印证了1CG模体集合就是核小体结合模体的理论猜想。
[Abstract]:Based on the phenomena of evolutionary separation of k-mer frequencies in genomic sequences, we analyzed the differences of k-mer frequencies between nucleosome core sequences and nucleosome junction sequences of yeast genome, and analyzed the trimodal distribution of 8-mer frequencies in the intergenic sequences of human chromosome 1 and the fractionation under XY dinucleotide classification. The theoretical predicted nucleosome binding motif set is given and compared with the experimental results of nucleosome occupancy rate. The specific contents are as follows. Based on the precise nucleosome localization labeling on the yeast genome sequence given by Brogaard et al., all nucleosome center sequences and nucleosome junction sequences are obtained. The difference of relative use frequency (RF) of k-mer (k 4,5,6 and 8) in two types of sequences was analyzed. It was found that when k (> 6), a few high frequency k-mers were used differently. The more obvious the difference was, the more stable the difference was after k7. The 8-mer of core sequence and the G of connecting sequence were calculated respectively in the region where the relative use frequency was less than 0.5. The results showed that when the relative frequency of 8-mer gradually decreased, the G+C content of corresponding motifs gradually increased. The use of GG and C C dinucleosides was preferred by the connecting sequence, and the use of CG and GC dinucleosides was obviously preferred by the core sequence. The theoretical prediction of nucleosome-binding motif sets is of great significance for the overall understanding of nucleosome localization and chromatin remodeling, as well as the structure and evolution of DNA sequences. The 8-mer set was classified according to the content of CG-dinucleoside in 8-mer. It was found that three 8-mer subsets (OCG, 1CG and 2CG) formed independent unimodal distributions, which were not found in the other 15 types of dinucleosides. The three peaks of 8-mer were the superposition of the three CG-8-mer subsets. In order to verify our conjecture, the relative frequencies of preference and rare trinucleotides in the 1CG 8-mer set are calculated, and the characteristic parameters Ktri (O) and Ktri (R) of nucleosomes are constructed respectively. The results of linear fitting showed that the sequences with confidence greater than 95% accounted for 89.2% of the total, and those with confidence greater than 99% accounted for 81.6% of the total. The theoretical conjecture of the body combined with the phantom.
【学位授予单位】:内蒙古大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:Q343.23
本文编号:2213844
[Abstract]:Based on the phenomena of evolutionary separation of k-mer frequencies in genomic sequences, we analyzed the differences of k-mer frequencies between nucleosome core sequences and nucleosome junction sequences of yeast genome, and analyzed the trimodal distribution of 8-mer frequencies in the intergenic sequences of human chromosome 1 and the fractionation under XY dinucleotide classification. The theoretical predicted nucleosome binding motif set is given and compared with the experimental results of nucleosome occupancy rate. The specific contents are as follows. Based on the precise nucleosome localization labeling on the yeast genome sequence given by Brogaard et al., all nucleosome center sequences and nucleosome junction sequences are obtained. The difference of relative use frequency (RF) of k-mer (k 4,5,6 and 8) in two types of sequences was analyzed. It was found that when k (> 6), a few high frequency k-mers were used differently. The more obvious the difference was, the more stable the difference was after k7. The 8-mer of core sequence and the G of connecting sequence were calculated respectively in the region where the relative use frequency was less than 0.5. The results showed that when the relative frequency of 8-mer gradually decreased, the G+C content of corresponding motifs gradually increased. The use of GG and C C dinucleosides was preferred by the connecting sequence, and the use of CG and GC dinucleosides was obviously preferred by the core sequence. The theoretical prediction of nucleosome-binding motif sets is of great significance for the overall understanding of nucleosome localization and chromatin remodeling, as well as the structure and evolution of DNA sequences. The 8-mer set was classified according to the content of CG-dinucleoside in 8-mer. It was found that three 8-mer subsets (OCG, 1CG and 2CG) formed independent unimodal distributions, which were not found in the other 15 types of dinucleosides. The three peaks of 8-mer were the superposition of the three CG-8-mer subsets. In order to verify our conjecture, the relative frequencies of preference and rare trinucleotides in the 1CG 8-mer set are calculated, and the characteristic parameters Ktri (O) and Ktri (R) of nucleosomes are constructed respectively. The results of linear fitting showed that the sequences with confidence greater than 95% accounted for 89.2% of the total, and those with confidence greater than 99% accounted for 81.6% of the total. The theoretical conjecture of the body combined with the phantom.
【学位授予单位】:内蒙古大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:Q343.23
【参考文献】
相关期刊论文 前6条
1 周德良;李宏;杨小希;;人类1号染色体DNA序列8-mer的相对模体数分布及8-mer使用的进化分离[J];生物物理学报;2015年01期
2 刘辉;壮子恒;关佶红;周水庚;;核小体定位的转录调控功能研究进展[J];生物化学与生物物理进展;2012年09期
3 刘宏德;孙啸;;核小体定位模式及其与DNA甲基化位点分布的关系[J];中国生物化学与分子生物学报;2011年03期
4 刘宏德;张德金;谢建明;袁志栋;马昕;卢志远;龚乐君;孙啸;;miRNA基因和编码基因启动子区核小体定位分析[J];科学通报;2010年14期
5 黄百渠,曾庆华,毕晓辉,王玉红,李玉新;组蛋白和核小体在基因转录中的作用[J];科学通报;2000年19期
6 曾庆华,尹东,孙迎春,黄百渠,吕延成;组蛋白与转录因子在hAMFR基因启动子序列上的结合及相互作用[J];遗传学报;1999年04期
,本文编号:2213844
本文链接:https://www.wllwen.com/shoufeilunwen/jckxbs/2213844.html