当前位置:主页 > 科技论文 > 基因论文 >

集成特征选择与基因调控网络构建研究

发布时间:2019-04-26 13:27
【摘要】:随着生物信息技术的快速发展,海量基因组数据的涌现进入后基因组时代,研究者不再局限于研究单个基因的功能,而是希望以系统的角度理解维持生物生命活动的复杂生命过程,在这种背景下,系统生物学得到了快速发展。在系统生物学领域,挑战之一就是基因调控网络的构建,基因调控网络以图形化的方式描述了基因之间的相互作用,通过逆向工程构建出基因调控网络可以帮助我们更好的理解当环境条件发生波动时生物体内仍能保持稳定的分子机制。随着DNA微阵列技术的发展,快速积累的基因表达数据,出现了大量的构建基因调控网络的方法。此外,基因序列数据和功能注释数据等也在不断涌现。不同类型数据往往提供了不同的信息,如何有效的利用多种数据源之间的互补性,对于准确构建基因调控网络至关重要。针对基于基因表达数据,利用特征选择方法进行基因调控网络构建的不足,即往往仅给出网络中每条潜在边的重要性评分,而没有确定一个合适的阈值将排序结果转化为网络结构。本文提出了集成特征重要性遗传算法(Ensemble Feature Importance-Genetic Algorithm,EFI-GA),结合集成特征选择算法和遗传算法构建基因调控网络。首先利用集成特征选择方法为目标基因的每个潜在调控者计算一个重要性分值,该分值表示在该调控基因和目标基因间存在真实调控关系的可信度。然后利用遗传算法在具有较高可信度的调控者中筛选出最优的调控者子集。在逆向工程评估与方法对话(Dialogue for Reverse Engineering Assessments and Methods,DREAM)数据集上的实验结果表明了该方法的有效性。为了应对外部环境刺激或者完成某种生命过程,转录因子通过调控目标基因来执行相应的功能共同参与同一生命过程,因此两者之间往往具有相同或相近的功能,考虑转录因子和目标基因之间的功能相关性将有助于提高构建调控网络的准确性。本文提出了一种融合基因表达数据、基因序列数据以及基因本体(Gene Ontology,GO)数据构建基因调控网络的多特征融合方法,以有效运用不同数据源提供的相关特性提高基因调控网络构建的准确性。利用多种数据源构建特征向量,并使用支持向量机建立分类模型,预测转录因子和目标基因之间的调控关系。在拟南芥数据集和番茄数据集上的交叉验证结果表明本文方法具有更高的准确率。
[Abstract]:With the rapid development of bio-information technology, the emergence of massive genome data into the post-genome era, researchers are no longer limited to the study of the function of a single gene, It is hoped that the complex life process of maintaining biological life can be understood from the point of view of system. Under this background, system biology has been developed rapidly. In the field of system biology, one of the challenges is the construction of gene regulatory networks, which graphically describe the interactions between genes. The construction of genetic regulatory networks through reverse engineering can help us to better understand the molecular mechanism that remains stable in organisms when environmental conditions fluctuate. With the development of DNA microarray technology, there are a lot of methods to construct gene regulation network with the rapid accumulation of gene expression data. In addition, gene sequence data and functional annotation data are also emerging. Different types of data often provide different information. How to make effective use of the complementarities of multiple data sources is very important for the accurate construction of gene regulatory networks. In view of the deficiency of using feature selection method to construct gene regulation network based on gene expression data, that is to say, the importance score of each potential edge of the network is often given. No appropriate threshold is determined to convert the sorting result into a network structure. This paper proposes an integrated feature importance genetic algorithm (Ensemble Feature Importance-Genetic Algorithm,EFI-GA), which combines integrated feature selection algorithm and genetic algorithm to construct gene regulation network. Firstly, the integrated feature selection method is used to calculate an importance score for each potential regulator of the target gene, which indicates the credibility of the real regulatory relationship between the regulatory gene and the target gene. Then the genetic algorithm is used to screen the optimal subset of regulators with high reliability. The experimental results on the data set of reverse engineering evaluation and method dialogue (Dialogue for Reverse Engineering Assessments and Methods,DREAM) show the effectiveness of the proposed method. In order to respond to external environmental stimulation or to complete a certain life process, transcription factors participate in the same life process by regulating the target genes to perform the corresponding functions, so they often have the same or similar functions. Considering the functional correlation between transcription factors and target genes will help to improve the accuracy of constructing regulatory networks. In this paper, a multi-feature fusion method for constructing gene regulation network based on fusion gene expression data, gene sequence data and gene ontology (Gene Ontology,GO) data is proposed. In order to effectively use the characteristics provided by different data sources to improve the accuracy of the construction of gene regulatory networks. Feature vectors are constructed from multiple data sources, and classification models are built by using support vector machines to predict the regulatory relationship between transcription factors and target genes. The cross-validation results on Arabidopsis and tomato datasets show that the proposed method has higher accuracy.
【学位授予单位】:大连理工大学
【学位级别】:硕士
【学位授予年份】:2016
【分类号】:Q811.4;TP18

【相似文献】

相关期刊论文 前10条

1 张家军;蔡传政;王翼飞;;基因调控网络中的延滞动力学[J];应用科学学报;2007年01期

2 郭子龙;纪兆华;涂华伟;梁艳春;;基因调控网络的研究内容及其数据分析方法[J];电脑知识与技术;2008年15期

3 陈少白;罗嘉;;一类基因调控网络的定性分析[J];南京信息工程大学学报(自然科学版);2010年05期

4 李庆伟;全俊龙;刘欣;;基因调控网络研究进展[J];辽宁师范大学学报(自然科学版);2013年01期

5 叶纬明;吕彬彬;赵琛;狄增如;;少节点基因调控网络的控制[J];物理学报;2013年01期

6 王沛;吕金虎;;基因调控网络的控制:机遇与挑战[J];自动化学报;2013年12期

7 易东,李辉智;基因调控网络研究与数学模型的建立[J];中国现代医学杂志;2003年24期

8 雷耀山,史定华,王翼飞;基因调控网络的生物信息学研究[J];自然杂志;2004年01期

9 姜伟;李霞;郭政;李传星;王丽虹;饶绍奇;;时间延迟基因调控网络重构的决策树方法研究[J];中国科学(C辑:生命科学);2005年06期

10 张晗,宋满根,陈国强,骆建华;一种改进的多元回归估计基因调控网络的方法[J];上海交通大学学报;2005年02期

相关会议论文 前3条

1 熊江辉;李莹辉;;基因芯片数据分析的新方法与基因调控网络推理[A];全面建设小康社会:中国科技工作者的历史责任——中国科协2003年学术年会论文集(上)[C];2003年

2 王亚丽;周彤;;大规模基因调控网络因果关系的辨识[A];第二十九届中国控制会议论文集[C];2010年

3 冯晶;许勇;李娟娟;;非高斯噪声激励下基因调控网络的研究[A];第十四届全国非线性振动暨第十一届全国非线性动力学和运动稳定性学术会议摘要集与会议议程[C];2013年

相关重要报纸文章 前1条

1 吴佳s,

本文编号:2466104


资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/jiyingongcheng/2466104.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户f552e***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com