基于拓扑二级结构和阅读框识别不同细胞器基因组的非编码RNA

发布时间:2018-01-09 22:04

  本文关键词:基于拓扑二级结构和阅读框识别不同细胞器基因组的非编码RNA 出处:《内蒙古大学》2016年博士论文 论文类型:学位论文


  更多相关文章: 细胞基因组非编码RNA 开放阅读框 拓扑二级结构 microRNA基因簇 功能和通路富集分析


【摘要】:随着功能基因组学的飞速发展,对非编码转录产物功能的研究引起越来越多人的关注。人类基因组中超过98%的序列为非蛋白质编码DNA,这些非编码DNA序列大部分会转录成RNA,并且直接以RNA的形式发挥功能,参与基因的转录调控、蛋白质的翻译等重要的生物学功能。而且近年来人们发现非编码RNA往往与疾病、DNA损伤修复、植物的应激反应有关。随着RNA数据的积累,细胞核基因组编码的短链或长链非编码RNA(Non-coding RNA,ncRNA)被确认对线粒体的功能以及线粒体动力学有一定的调节作用。许多ncRNAs一方面调控与细胞器功能相关的核基因,同时也与线粒体的形态、新陈代谢,线粒体白噬以及与线粒体有关的凋亡有关。然而,人们对遗传信息在不同细胞器间以ncRNAs形式传递的了解长期以来仍然有限。因此,理解细胞器之间的ncRNAs信息传递显得非常重要。随着ncRNA研究在细胞器基因组水平的日益深入,发现对不同细胞器基因组转录的ncRNA的识别有助于进一步了解不同细胞器基因组ncRNA的功能。本研究在细胞器基因组水平对不同细胞器基因组转录的ncRNAs的注释问题进行了系统的研究,包括构建细胞器基因组水平不同定位信息的非编码RNAs数据集、结合ncRNA序列和结构特征提取有效的特征参数并对参数进行优化、预测算法的建立以及算法的推广等。积累的组学数据所呈现的基因水平的复杂性很难从蛋白质编码基因数量的角度加以解释,为此,有人认为长期以来被认为是垃圾的、缺乏蛋白质编码能力的ncRNA的调节作用可以解释这种现象。其中,微小RNAs (microRNAs, miRNAs)和小干涉RNAs (siRNAs)一致被认为在生物的调控功能中扮演重要角色。本论文最后,以microRNA为例,研究非编码RNA与靶基因对乳腺癌的发生、发展的调控作用。考虑到多个microRNA对靶基因存在协同调控作用,我们选取在乳腺癌中起抑癌作用的miR-17-92基因簇及其2个旁系同源基因簇转录的1 5种microRNA序列及其共调控靶基因作为研究对象,对基因簇转录的microRNA序列特征及其共调控靶基因在乳腺正常组织和乳腺癌组织中的调控作用进行研究。论文主要的研究内容如下:一、我们首次从NONCOGING v3.0数据库中收集并整理出有细胞器基因组注释的ncRNA序列,并对序列长度分布进行分析。考虑到序列相似性对预测的影响,进一步采用Cd-hit软件构建了序列相似性在80%以下的数据集ncRNA_361 dataset。从最简单的碱基物理化学特性出发,讨论不同细胞器基因组转录的ncRNA序列的理化特性。在此基础上进一步考虑阅读框下的n-mer组分偏好,结构-序列模式下三联体组分,以及简并密码子偏好。通过深入探讨无阅读框与阅读框对识别不同细胞器基因组转录的ncRNA序列的影响,发现最优阅读框为第一阅读框。二、考虑到ncRNA的结构信息更能反映ncRNA执行功能时的空间构象,而保守模体反映了序列在长期进化过程中的压力。我们首次提取ncRNA序列的拓扑二级结构特征和保守模体作为在细胞器基因组水平识别ncRNA的特征参量。特征的融合不可避免会带来维数的增加,我们结合前人的经验,提出两种不同的降维方法:一是特征的降维映射,二是基于mRMR的增加特征选择(IFS)的方法,即选取最优特征子集。结合目前较为流行的离散增量算法(increment of diversity classifier, ID)、K紧邻算法(K-nearest neighbor classifier, KNN)以及支持向量机算法(support vector machine, SVM)提出多算法的融合:改进的离散量结合K紧邻算法(the improved K-minimum increment of diversity classifier, iK-MID)、高效的平均K紧邻算法(the improved K-nearest neighbor classifier, iKNN)以及离散增量结合支持向量机算法(the increment of diversity combining support vector machine,ID-SVM)。最后,通过不同算法之间的相互比较,探索更加有效的细胞器基因组ncRNA识别的理论模型。三、应用生物信息学手段,对特定miRNA基因簇(hsa-miR-17-92基因簇)及其旁系同源基因簇转录的miRNA序列特征及其共调控靶基因在乳腺不同组织的表达水平进行了研究,并利用反馈机制简单解释该miRNA对下游基因的调控机制,为生物学实验提供了有一定研究意义和价值的线索。
[Abstract]:With the rapid development of functional genomics, research on non encoding transcription function has attracted more and more attention. More than 98% of the sequence in the human genome is a non protein encoding DNA, these non DNA sequences encoding most transcribed into RNA, and directly in the form of RNA function, participate in gene transcription regulation, biological function protein translation and so on. But in recent years it was found that non encoding RNA are often associated with the disease, DNA damage repair, the stress response of plants. With the RNA data, the nuclear genome encoding short or long chain non encoding RNA (Non-coding RNA ncRNA) was identified on mitochondrial function and regulation of mitochondrial dynamics effect of regulation of nuclear gene related to cell function of many ncRNAs on the one hand, and the morphology of mitochondria, mitochondrial autophagy and The new supersedes the old., and line Particles related to apoptosis. However, the people of the genetic information in the form of ncRNAs in different organelles of understanding between a long time is still limited. Therefore, understanding the organelle ncRNAs information transmission is very important. With the increasingly ncRNA research in organelle genome level further, found that the identification of different organelle genome transcription the ncRNA helps to further understand the different organelle genomes ncRNA function. This study makes a systematic research on the organelle genome level annotation on different organelle genome transcription of ncRNAs, encoding non RNAs data including the construction of organelle genomes with different levels of positioning information set, combined with ncRNA sequence and structure feature extraction the effective feature parameters and optimizes the parameters prediction algorithm is established and the algorithm for the promotion. The data of gene water accumulation group Flat complexity is very difficult from the amount of protein encoding genes explain, therefore, some people think that has long been regarded as waste, lack of regulation of protein encoding ability of ncRNA can explain this phenomenon. Among them, the small RNAs (microRNAs, miRNAs) and small interference RNAs (siRNAs) was found to play an important role in the regulation of biological function. At the end of the thesis, taking microRNA as an example, study the non occurrence of encoding RNA and target gene for breast cancer, regulation of development. Considering the multiple microRNA synergistic regulation of target genes, we selected in breast cancer within 15 microRNA sequence of miR-17-92 gene cluster of tumor suppressor role and 2 paralogous gene clusters and co transcriptional regulation of target genes as the research object, the sequence characteristics of microRNA gene clusters and co transcriptional regulation of target genes in normal tissues and breast Regulation of breast cancer tissues were studied. The main research contents of this paper are as follows: first, we for the first time from the NONCOGING V3.0 database to collect and sort out the ncRNA sequence of organelle genome annotation, and the sequence length distribution were analyzed. Considering the influence of sequence similarity to predict, further uses the Cd-hit software to build the sequence the similarity in the 80% following data sets ncRNA_361 dataset. base starting from the physical and chemical characteristics of the most simple, to discuss the physicochemical properties of different ncRNA sequences of organelle genome transcription. On the basis of further consideration of reading n-mer group box the preference structure sequence mode three CIS components, as well as degenerate codon preference No. Through discussing the reading frame and reading frame influence ncRNA sequence recognition of different organelle genome transcription, find the optimal reading frame for the first reading frame two. And considering the structure information of ncRNA could reflect the spatial conformation of ncRNA when executing the function, and the conserved motif sequence reflects the pressure in the long process of evolution. We first extract two features of ncRNA sequences and topological motifs as organelles in the genome level general characteristic parameters of ncRNA. Don't increase the feature integration will inevitably bring dimension, we combined with previous experience, put forward two kinds of different reduction methods: one is the reduction characteristics of the two is to increase the feature selection based on mRMR (IFS) method, which selects the optimal feature subset. Combined with the discrete incremental algorithm popular (increment of diversity classifier. ID K (K-nearest), close to the neighbor classifier KNN algorithm) and the support vector machine algorithm (support vector machine, SVM) proposed fusion algorithm: discrete combined with improved K close to the count (the improved K-minimum increment of by diversity classifier, iK-MID), the average K efficient algorithm (the improved K-nearest neighbor near classifier, iKNN) and combined with the discrete incremental support vector machine algorithm (the increment of diversity combining support vector machine, ID-SVM). Finally, through the comparison between different methods, explore the theoretical model of cell genomic ncRNA recognition more effectively. Three, application of bioinformatics to specific miRNA gene cluster (hsa-miR-17-92 cluster) miRNA sequence characteristics and paralogous gene cluster and its transcriptional co regulation the expression level of target genes in different tissues of the breast were studied, and the feedback mechanism of the regulation mechanism of miRNA simple explanation on the downstream gene, provides certain research significance and valuable clues for biological experiments.

【学位授予单位】:内蒙古大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:Q811.4

【参考文献】

相关期刊论文 前10条

1 武成艳;李前忠;陈颖丽;;不同细胞器基因组转录的ncRNA的序列特征分析和识别[J];内蒙古大学学报(自然科学版);2015年05期

2 孟琳;王天一;李晓曦;马萍;;MicroRNA在乳腺癌中作用的研究进展[J];现代肿瘤医学;2015年09期

3 李洁;秦性良;邵宁生;;MicroRNA及其靶基因的时空特异性与动态变化[J];生物化学与生物物理进展;2013年07期

4 叶静;李明华;龙霞;万汇涓;范昭;;乳腺癌MCF-7细胞的分子生物学特征[J];实用癌症杂志;2012年02期

5 祁磊;苗俊英;;长非编码RNA[J];生命的化学;2011年03期

6 陈润生;;关于非编码RNA研究的一些思考[J];生命科学;2010年07期

7 于红;;表观遗传学:生物细胞非编码RNA调控的研究进展[J];遗传;2009年11期

8 崔彬;李娜;宁长申;张龙现;菅复春;;基因序列在原虫分子系统学中的应用[J];中国病原生物学杂志;2008年03期

9 陈润生;;与生物信息学相关的两个前沿方向——非编码基因和复杂生物网络[J];生物物理学报;2007年04期

10 陈龙;李俐俐;;非编码RNA及其功用[J];生物学教学;2007年07期



本文编号:1402682

资料下载
论文发表

本文链接:https://www.wllwen.com/shoufeilunwen/jckxbs/1402682.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户e5b4b***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com