当前位置:主页 > 科技论文 > 自动化论文 >

基于新一代测序数据的启动子类型识别研究

发布时间:2018-04-25 23:09

  本文选题:启动子 + 组蛋白修饰 ; 参考:《哈尔滨工业大学》2017年硕士论文


【摘要】:对于人类基因组的整体探究工作已进入“后基因组时代”,这是一个以揭示、阐明、挖掘基因组功能为核心研究对象的时代,在测序技术大力发展的东风下,基因表达产物和表观信息的功能鉴定已然进入“大规模、高通量”的全新阶段。对于基因表达调控机理的研究俨然是热门课题。而对于基因表达调控网络中的关键元件——启动子类型识别研究成为更深入理解人类基因组庞杂调控机制的敲门砖。在本课题中,我们首先对注释基因数据进行了一步预处理,得到本文称之为单一基因的数据,然后基于RNA-seq测序数据,计算多个细胞系(Hepg2、Huvec、Gm 12878、K562及H1hesc)的基因表达量及分析各个细胞系的基因表达水平。接着又根据启动子区域有RNA聚合酶Ⅱ富集的特性,利用PolⅡ的ChIP-seq数据结合基因表达水平去识别活跃启动子与预备启动子,并研究了包含不同类型启动子的基因的表达水平情况以及在此基础上分析细胞系中的选择性启动子情况。最后,将距离基因转录起始位点上下游各1000碱基对的大区域分割为10个长度为200碱基对的小区段,去统计细胞系H1hesc、Huvec以及Gm12878的6种组蛋白修饰信号在划分区域的分布情况,分析组蛋白修饰信号在不同类型启动子区分布的特异性。以细胞系H1hesc的组蛋白修饰特征数据为训练集,应用机器学习算法训练分类器对细胞系Huvec和Gm 12878的候选启动子进行类型预测识别。
[Abstract]:The overall exploration of the human genome has entered the "post-genome era". This is an era in which the core research object is to reveal, clarify and excavate the function of the genome, with the rapid development of sequencing technology. Functional identification of gene expression products and epigenetic information has entered a new stage of "large scale and high throughput". The study of gene expression regulation mechanism has been a hot topic. The study of promoter type recognition, which is a key element in gene expression regulation network, has become the key to a deeper understanding of the complex regulatory mechanism of the human genome. In this paper, we preprocess the annotated gene data in one step, get the data that we call a single gene, and then sequence the data based on RNA-seq. The gene expression of Gm12878K562 and H1 hescwas calculated and the gene expression level of each cell line was analyzed. Then, according to the characteristic of RNA polymerase 鈪,

本文编号:1803392

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/1803392.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户75acd***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com