当前位置:主页 > 科技论文 > 自动化论文 >

基于计算智能方法的多茎环pre-miRNA预测研究

发布时间:2018-07-03 05:25

  本文选题:pre-miRNA + SMOTE ; 参考:《济南大学》2016年硕士论文


【摘要】:遗传信息DNA经过转录生成mRNA,mRNA在核糖体中经过翻译生成蛋白质,这就是长期以来人们对生物学中心法则的理解,但microRNA(miRNA)的发现却改变了人们对中心法则的最初认识。miRNA是一类重要的长度较短(约为21~23个核苷酸)的非编码RNA基因,通过碱基互补配对原则与靶标mRNA结合来决定分解还是抑止mRNA的翻译作用,进而起到影响基因表达的作用。最新研究发现,miRNA调控着人类约20%~30%的基因表达,miRNA不仅参与生理代谢、机体的生长和发育、细胞增殖与凋亡等,而且实验证明还与癌症的发生有着错综复杂的关系,因此深入研究miRNA将有助于人们深入了解基因调控网络的奥秘,同时也对生物进化的探索具有重要指导作用。我们的研究工作主要包括以下四个方面:(1)从miRBase数据库中提取了695条人类pre-miRNA样本,经过删除冗余的环节,最终剩余691条。从人类RefSeq基因中获取了8494条非冗余伪发夹序列,从Lander手动注明建立的人类非编码RNA数据库中提取了1020条(除miRNA)非编码RNA序列,删除冗余的和序列长度超过150个碱基的,剩余754条序列。针对我们建立的数据集不平衡问题,我们分别采用样本数据预处理方法和内部方法使阴阳性数据集达到平衡。(2)借鉴目前预测效果最好的miPred方法中采用的29维全局和内在特征,并在此基础上加入了19维理化和结构特征。选择最具区别度的特征能够减少系统复杂度提高我们预测模型的预测效率,所以我们采用包装和过滤方法对这48维样本特征进行最优特征选择,最终剩余21维特征,其中包括7维miPred特征和14维新引入的结构特征,这也证明我们新引入的结构特征比序列特征具有更高的区别度。(3)鉴于人工神经网络具有自学习、自适应与自组织的优点,所以我们首先选择人工神经网络模型进行预测,通过5折交叉验证,实验预测结果准确率为93.58%,明显高于triplet-SVM和MiPred等其它预测方法。(4)将神经网络预测模型对6095条其它(除人类)动物与miRBase中的139条病毒pre-miRNAs进行预测,预测准确率分别达到97.18%、94.24%,预测效果都得到了很大提高,证明我们构建的人工神经网络预测模型能够有效的预测miRNA,并为miRNA的预测提供了一条崭新的研究思路。
[Abstract]:Genetic information DNA is transcribed to produce mRNAs mRNA and translated into proteins in ribosomes, which has long been understood as the central principle of biology. However, the discovery of microRNA (miRNA) has changed the initial understanding of the central rule. MiRNA is an important class of non-coding RNA genes with short length (about 21 ~ 23 nucleotides). Through the principle of base complementary pairing and target mRNA binding to determine whether to decompose or inhibit the translation of mRNA, and then play a role in gene expression. New research has found that miRNA regulates about 20% of human gene expression. MiRNA not only participates in physiological metabolism, body growth and development, cell proliferation and apoptosis, but also has a complex relationship with the occurrence of cancer. Therefore, further study of miRNA will help people to understand the secrets of gene regulatory networks, and also play an important role in the exploration of biological evolution. Our research work mainly includes the following four aspects: (1) 695 human pre-miRNA samples were extracted from miRBase database. 8494 non-redundant pseudo hairpin sequences were obtained from the human RefSeq gene, 1020 non-coding RNA sequences (except miRNA) were extracted from the human non-coding RNA database set up by Lander, and redundant and sequence lengths exceeding 150 bases were deleted. The remaining 754 sequences. In order to solve the imbalance problem of data set, we use sample data preprocessing method and internal method to balance the data set of yin and yang. (2) drawing lessons from the 29 dimensional global and internal characteristics of miPred method, which is the best prediction method at present, On this basis, 19 dimensional physicochemical and structural characteristics were added. Choosing the most distinguishing feature can reduce the complexity of the system and improve the prediction efficiency of our prediction model. So we use packaging and filtering methods to select the optimal feature of the 48 dimensional sample features, and finally the remaining 21 dimensional features. It includes 7 dimensional miPred feature and 14 dimensional new structure feature, which also proves that our new structure feature has a higher distinction than the sequence feature. (3) since the artificial neural network has the advantages of self-learning, self-adaptation and self-organization, So we first choose the artificial neural network model to predict, through 50% discount cross-validation, The accuracy of experimental prediction was 93.58, which was significantly higher than that of other prediction methods such as triplet-SVM and MiPred. (4) the neural network prediction model was used to predict the pre-miRNAs of 6095 other (except human) animals and miRBase viruses. The prediction accuracy is 97.18% and 94.24% respectively, and the prediction effect has been greatly improved, which proves that the artificial neural network prediction model can effectively predict miRNAs, and provides a new research idea for miRNA prediction.
【学位授予单位】:济南大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:Q52;TP183

【参考文献】

相关期刊论文 前10条

1 陈志茹;洪文学;;嵌入欠采样技术的支持向量机集成分类算法的MicroRNA靶标预测[J];生物医学工程学杂志;2016年01期

2 王颖;李金;王磊;徐成振;才忠喜;;基于机器学习的microRNA预测方法研究进展[J];计算机科学;2015年02期

3 Miao Zhang;Zhen-Zhou Lai;Dan Li;Yi Shen;;Multi-Class Support Vector Machine Classifier Based on Jeffries-Matusita Distance and Directed Acyclic Graph[J];Journal of Harbin Institute of Technology;2013年05期

4 夏天;肖丙秀;郭俊明;;长链非编码RNA的作用机制及其研究方法[J];遗传;2013年03期

5 赵屹;谷瑞升;杜生明;;生物信息学研究现状及发展趋势[J];医学信息学杂志;2012年05期

6 马圣运;白玉;韩凝;王君晖;翁晓燕;边红武;朱睦元;;miRNA~*生物合成及其功能研究的新发现[J];遗传;2012年04期

7 毛健;赵红东;姚婧婧;;人工神经网络的发展及应用[J];电子设计工程;2011年24期

8 高青;鞠志花;王长法;李国荣;;miRBase-microRNA序列数据库[J];家畜生态学报;2011年06期

9 George P.COBB1,Todd A.ANDERSON;Identification and characterization of new plant microRNAs using EST analysis[J];Cell Research;2005年05期

10 ;Computational Identification of Novel Family Members of MicroRNA Genes in Arabidopsis thaliana and Oryza sativa[J];Acta Biochimica et Biophysica Sinica;2005年02期

相关博士学位论文 前2条

1 涂娟娟;PSO优化神经网络算法的研究及其应用[D];江苏大学;2013年

2 高鹏毅;BP神经网络分类器优化技术研究[D];华中科技大学;2012年

相关硕士学位论文 前3条

1 陈斌;SMOTE不平衡数据过采样算法的改进与应用[D];广西大学;2015年

2 胡玲玲;MicroRNA预测分类及其特性研究[D];厦门大学;2014年

3 张海涛;基于多表达式编程的分类算法研究[D];石家庄经济学院;2011年



本文编号:2092567

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2092567.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户85849***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com