基于RNA-Seq数据的关键选择性剪接识别方法研究
[Abstract]:Alternative splicing refers to a cellular intrinsic mechanism that can be transcribed and translated into a plurality of proteins having different functions during gene expression. Identifying alternative splicing events is of great significance to the study of protein structure, protein structure diversity, cell differentiation and species evolution. With the advent of high-throughput sequencing technology and rapid development, the identification of selective splice events from transcriptome high-throughput sequencing (RNA-Sequence, RNA-Seq) data has become a leading issue in the study of bioinformatics. However, how to accurately identify exon skipping events and introns retention events from RNA-Seq data remains an unresolved issue. The existing methods still exist many problems in identifying exon skipping and intron retention, for example, using partial information related to exon skipping events or intron retention events to build a computing model and method; A low-quality sequencing video segment is used; there is no uniform standard for the regularization of features; it is not pointed out which feature is the most effective for accurately identifying alternative splicing events. In view of the existing problems in the methods of identifying exon skipping events and introns retention events based on RNA-Seq data, this paper makes an in-depth discussion and study. The main work and innovation points of this paper include: (1) the relevant characteristic analysis methods of exon skipping events are put forward, and the innovation points of the method include: expressing each exon by using a plurality of characteristics related to exon skipping events, The effect of each feature on accurate identification of exon skipping events was assessed, and four feature sets were constructed to assess the effect of different feature regularization methods on accurately identifying exon skipping events. In the real human skeletal muscle, the RNA-Seq data of the brain, the heart and the liver tissue, the recognition results of the existing method are integrated into the reference set, In this paper, the influence of different regularized expressions of the characteristics of the short segment of sequencing on the jumping events of exons can be accurately identified. The results of the experiment indicate that the characteristics of the short segment of the upstream and downstream exons of the selective exon are mapped to the jump region of the support exon, and the two features of the selective exon reservation horizontal psi score have an important influence on the accurate identification of the exon skipping events. and it is concluded that the feature has no significant effect on the accurate identification of exon jump events using the number of original sequencing clips or using a normalized number of sequenced short segments. (2) An innovative point of EscoalEscoall method based on multi-characteristic analysis of exon skipping events is proposed, including: filtering out mapping results with low mapped mass numbers and ambiguous sequencing video segments, and combining a plurality of features related to exon skipping events, including: The method comprises the following steps of: mapping the characteristic of the sequencing short film segment to the inner region of the exon, supporting the characteristics of the sequencing film segment connecting the two exons, and the gene expression information and the like, and designing a score calculation method of the new exon jump event, for identifying exon skipping events from RNA-Seq data under two different conditions. In real human skeletal muscle and brain tissue RNA-Seq data, exon skipping events were identified using the Escoall approach. Comparing the recognition results with the recognition results of other methods, the experiment results show that the EScall method can effectively reduce false positive and false negative identification results, and obtain better prediction results. (3) A new method for the identification of intron retention events based on joint score is proposed, including: the feature of 7 species associated with intron retention events, including the feature of sequencing short segments mapped to introns interior regions, a sequencing video segment feature that supports intron splicing of the intron upstream exon and the intron downstream exon, a sequencing short segment feature mapped to an inner region of the intron upstream and downstream, a sequencing short segment feature that overlaps the 5 'end splice site, Compared with the 3' terminal splicing site, the sequence short segment feature, the intron region can be sequenced short segment coverage ratio feature, and the gene expression information characteristic, the calculation method of the new intron retention event joint score IRSore is designed, An intron retention event is identified from RNA-Seq data under two different conditions. In real Arabidopsis thaliana skip mutants and wild-type RNA-Seq data, introns retention events were identified using the IRXRF method. Comparing the recognition results with the recognition results of other methods, the experimental results show that the IR032 method can effectively reduce false positive identification results, thus more accurately recognizing intron retention events. (4) An innovative point of IRclassfier IRclassfier method based on an intron retention event recognition method based on a random forest is proposed, which comprises the following steps: a reference set is constructed based on the recognition results of the three methods, 21 features related to the intron retention event are used to express each intron, A random forest-based classifier was constructed for identifying intron retention events from RNA-Seq data under two different conditions while analyzing the effect of each feature on accurately identifying intron retention events. In the real Arabidopsis thaliana skip mutant and wild-type RNA-Seq data, the training set was constructed by combining the recognition results of the existing three identification methods on chromosomes 1, 2 and 4. Intron retention events were identified using IRclassfier, and the results showed that the accuracy of the method was 99.2%. In addition, using IRclassfier to identify intron retention events on chromosome 3 and chromosome 5, the results of the identification are compared with the recognition results of the existing methods, and the results show that the IRclassfier method can accurately identify intron retention events and verify the effectiveness of the method.
【学位授予单位】:哈尔滨工业大学
【学位级别】:博士
【学位授予年份】:2016
【分类号】:Q78
【相似文献】
相关期刊论文 前10条
1 刘静;A Novel Vector for Abundant Expression of Antisense RNA, Triplex forming RNA and Ribozyme in vivo[J];High Technology Letters;2000年04期
2 鲁慧英;Detection of hepatitis C virus RNA sequences in cholangiocarcinomas in Chinese and American patients[J];Chinese Medical Journal;2000年12期
3 梁小兵 ,万国江 ,黄荣贵;Distribution and Variation of Ribonucleic Acid (RNA) and Protein and Its Hydrolysis Products in Lake Sediments[J];Chinese Journal of Geochemistry;2002年02期
4 Verheyden B ,徐其武;口服脊髓灰质炎疫苗核壳和RNA的稳定性[J];国外医学.预防.诊断.治疗用生物制品分册;2002年01期
5 CMBE译文组;探索小RNA的功能[J];现代临床医学生物工程学杂志;2004年03期
6 沈维干;RNA interference and its current application in mammals[J];Chinese Medical Journal;2004年07期
7 孙娣;汪洋;张丽娟;闫玉清;;一种简捷提取植物总RNA的方法[J];黑龙江医药;2005年06期
8 南海波;;小麦总RNA的提取[J];渤海大学学报(自然科学版);2006年01期
9 王冬来;;RNA干扰的成功与困惑[J];中国生物化学与分子生物学报;2008年06期
10 杨静;;试验性的小RNA药物可能引发失明[J];中国生物化学与分子生物学报;2009年05期
相关会议论文 前10条
1 金由辛;;面向21世纪的RNA研究[A];面向21世纪的科技进步与社会经济发展(下册)[C];1999年
2 ;第四届RNA全国研讨会大会报告日程安排[A];第四届全国RNA进展研讨会论文集[C];2005年
3 ;Function of Transfer RNA Modifications in Plant Development[A];植物分子生物学与现代农业——全国植物生物学研讨会论文摘要集[C];2010年
4 王峰;张秋平;陈金湘;;棉花总RNA的快速提取方法[A];中国棉花学会2011年年会论文汇编[C];2011年
5 关力;陈本iY;iJ云虹;郭培芝;魏重琴;邱苏吾;苗健;;关于动物}D~T中RNAn,定方法的研究[A];中国生理科学会学术会议论文摘要汇编(生物化学)[C];1964年
6 夏海滨;;小RNA在免疫学领域中的应用研究进展[A];中国免疫学会第五届全国代表大会暨学术会议论文摘要[C];2006年
7 ;The stability of hepatitis C virus RNA in various handling and storage conditions[A];中国输血协会第四届输血大会论文集[C];2006年
8 郭德银;;RNA干扰在病毒研究和控制中的应用[A];2006中国微生物学会第九次全国会员代表大会暨学术年会论文摘要集[C];2006年
9 甘仪梅;杨业华;王学奎;曹燕;杨特武;;棉花总RNA快速提取[A];中国棉花学会2007年年会论文汇编[C];2007年
10 ;Identification and characterization of novel interactive partner proteins for PCBP1 that is a RNA-binding protein[A];中国优生优育协会第四届全国学术论文报告会暨基因科学高峰论坛论文专辑[C];2008年
相关重要报纸文章 前10条
1 记者 冯卫东;研究人员发现可破坏肿瘤抑制基因的小RNA[N];科技日报;2009年
2 记者 储笑抒 通讯员 盛伟;人体微小RNA有望提前发出癌症预警[N];南京日报;2011年
3 泸州医学院副教授、科普作家 周志远;“大头儿子”与环状RNA[N];第一财经日报;2014年
4 麦迪信;小分子RNA可能有大作用[N];医药经济报;2003年
5 董映璧;美发现基因调控可回应“RNA世界”[N];科技日报;2006年
6 张忠霞;特制RNA轻推一下,就能“唤醒”基因[N];新华每日电讯;2007年
7 聂翠蓉;RNA:纵是配角也精彩[N];科技日报;2009年
8 冯卫东;RNA干扰机制首次在人体中获得证实[N];科技日报;2010年
9 冯卫东 王小龙;英在地球早期环境模拟条件下合成类RNA[N];科技日报;2009年
10 记者 常丽君;新技术让研究进入单细胞内RNA的世界[N];科技日报;2011年
相关博士学位论文 前10条
1 王赵玮;昆虫RNA病毒复制及昆虫抗病毒天然免疫机制研究[D];武汉大学;2014年
2 包纯;一类新非编码RNA的发现以及产生和功能的初探[D];华中师范大学;2015年
3 李语丽;基于MeRIP-seq的水稻RNA m6A甲基化修饰的研究[D];中国科学院北京基因组研究所;2015年
4 熊瑜琳;miR-122靶位基因STAT3调控长链非编码 RNA Lethe促进HCV复制的机制研究[D];第三军医大学;2015年
5 范春节;高通量测序鉴定毛竹小RNA及其功能分析[D];中国林业科学研究院;2012年
6 王加强;小鼠着床前胚胎特异ERV相关长非编码RNA的定向筛选及功能研究[D];东北农业大学;2015年
7 王业伟;非编码RNA SPIU的结构和功能研究和p19INK4D在APL发病中的作用[D];上海交通大学;2013年
8 邹艳芬;子痫前期中非编码RNA对滋养细胞功能的调控及机制探索[D];南京医科大学;2015年
9 朱乔;miR-10b在人肝细胞肝癌发生中的作用及其机制的初步探索[D];第四军医大学;2015年
10 蒋俊锋;长链非编码RNA BACE1-AS促进Aβ聚集及其调节BACE1和SERF1a的ceRNA机制研究[D];第二军医大学;2015年
相关硕士学位论文 前10条
1 全弘扬;长链非编码RNA在细胞内质网应激反应中的相关作用及机制研究[D];北京协和医学院;2015年
2 胡亮;DDX19A识别PRRSV基因组RNA并激活NLRP3炎症小体[D];中国农业科学院;2015年
3 雷文婕;小菜蛾不同发育时期RNA编辑位点的识别与验证[D];南京农业大学;2014年
4 周燕;RNA干扰对大鲵蛙病毒(CGSRV)主要功能基因表达与增殖影响的研究[D];四川农业大学;2015年
5 石新新;改进的RNA-Seq数据转录组表达分析研究[D];南京航空航天大学;2015年
6 陈金梅;利用植物表达药用干扰小RNA的研究[D];南京大学;2014年
7 郭维超;miR-17家族在肿瘤生长和迁移中的作用及机制[D];杭州师范大学;2016年
8 沈晓彤;RNA“一步法”检测的酶学基础及凝血酶等温扩增检测方法的研究[D];青岛科技大学;2016年
9 孙文阳;猪miR-15b前体单碱基突变对其生物加工过程的影响[D];甘肃农业大学;2016年
10 郅淑引;微小RNA25在肺癌血清中的表达量与临床意义的研究[D];山西医科大学;2016年
,本文编号:2309554
本文链接:https://www.wllwen.com/shoufeilunwen/jckxbs/2309554.html