当前位置:主页 > 医学论文 > 病理论文 >

基于结构数据的转录因子结合位点分析

发布时间:2018-04-16 16:23

  本文选题:基因调控 + 蛋白-核酸复合物 ; 参考:《东南大学》2005年硕士论文


【摘要】: 基因是遗传信息的物理和功能单位,基因功能的体现取决于其结构和表达调控状况。基因表达的调控就是把这些基因结构变换成多种多样基因功能的操作者。因此基因表达调控研究对揭示生命的奥秘具有重大意义。mRNA转录起始调控是调控的基本控制点,也是最重要的一环,其实质是转录因子结合相应的调控元件,影响了RNA聚合酶的活性,从而影响了基因的转录水平。针对目前国际上从结构角度研究较少的情况,本文从蛋白-核酸复合物的结构数据出发,分析氨基酸-碱基作用对,探索转录因子结合位点的预测方法。 本文从PDB大分子结构数据库中查寻出所有记录的蛋白-核酸复合物。利用复合物作用力计算软件,对这些蛋白-核酸复合物的三维空间结构数据进行处理,得到复合物中可能存在的氨基酸侧链同核酸之间的作用对。然后依据SWISSPROT数据库对蛋白的注释,分成与调控过程相关的复合物集合和与调控无关的复合物集合。我们对蛋白-核酸复合物中的氨基酸侧链同核酸之间的作用对(包括氢键跟非键作用)进行统计分析。通过分析DNA跟转录因子的氨基酸残基作用的局部环境信息,发现一些三联或者五联残基片段总是结合DNA,因此我们提出猜想:在转录相关的蛋白-核酸复合物中,氨基酸残基环境或者碱基环境在一定程度上决定了中央残基或者碱基是否为作用位点。 对非冗余的蛋白-核酸复合物数据集进行数据处理,提取结合序列和作用位点信息,用机器学习的方法初步探索了蛋白同核酸作用的结合模式。建立了一个反向传播神经网络,充分利用蛋白同核酸结合的信息,不断调整参数,反复进行训练学习,对DNA结合蛋白的结合残基进行了预测,发现局部环境信息能够以65.85%的NP较好地预测给出蛋白的结合残基。我们用支持向量机对结合碱基进行预测,选择不同窗宽和参数对数据进行训练和预测,并与神经网络方法进行比较,发现11个窗口长度的碱基环境的预测性能相对较好。使用径向基核函数,支持向量机成功预测为结合碱基的比例能达到89.72%,敏感性能达到66.71%。最后,我们对两种方法进行了比较,发现支持向量机预测结合碱基较为成功。 本文利用蛋白-核酸复合物作用对数据,用机器学习的方法初步探索了蛋白同核酸作用的结合模式,证明了在转录相关的蛋白-核酸复合物中,局部残基环境或者碱基环境在一定程度上决定了中央残基或者碱基是否为作用位点。
[Abstract]:Gene is the physical and functional unit of genetic information. The function of gene depends on its structure and expression regulation.The regulation of gene expression is to transform the structure of these genes into a variety of gene function operators.Therefore, the study of gene expression regulation is of great significance to reveal the mystery of life. The transcription initiation regulation of mRNA is the basic control point and the most important part of the regulation, and its essence is that transcription factors bind to the corresponding regulatory elements.It affects the activity of RNA polymerase and the transcription level of gene.In view of the fact that there are few studies on the structure of protein-nucleic acid complexes in the world at present, this paper analyzes the amino acid base interaction pairs and explores the prediction method of transcription factor binding sites from the structure data of protein-nucleic acid complexes.All recorded protein-nucleic acid complexes were identified from the PDB macromolecular structure database.Using the complex force calculation software, the three-dimensional spatial structure data of these protein-nucleic acid complexes were processed, and the possible interaction between amino acid side chain and nucleic acid was obtained.According to the SWISSPROT database, the proteins were divided into complex sets related to the regulation process and complex sets independent of regulation.The interaction between amino acid side chain and nucleic acid in protein-nucleic acid complex was analyzed statistically.By analyzing the local environmental information of the interaction between DNA and the amino acid residues of transcription factors, we found that some triplex or pentagonal residues always bind to DNA, so we suggest that in transcription-related protein-nucleic acid complexes,Amino acid residue environment or base environment determines whether the central residue or base is the action site to some extent.The non-redundant protein-nucleic acid complex data set was processed to extract binding sequences and action site information. The binding pattern of protein to nucleic acid was preliminarily explored by machine learning.A back propagation neural network was established to make full use of the information of protein binding to nucleic acid, to adjust parameters, to train and learn repeatedly, and to predict the binding residues of DNA binding protein.It was found that 65.85% of NP could well predict the binding residues of the protein.We use support vector machine (SVM) to predict the binding base, and select different window width and parameters to train and predict the data. Compared with the neural network method, we find that the prediction performance of 11 window length base environments is relatively good.By using radial basis kernel function, support vector machine can successfully predict that the ratio of binding bases can reach 89.72 and the sensitivity can reach 66.71.Finally, we compare the two methods and find that support vector machine (SVM) combined with base is more successful.In this paper, by using protein-nucleic acid complex interaction data and machine learning method, we have preliminarily explored the binding pattern between protein and nucleic acid, which proves that it is in transcription-related protein-nucleic acid complex.The local residue environment or base environment determines whether the central residue or base is the action site to some extent.
【学位授予单位】:东南大学
【学位级别】:硕士
【学位授予年份】:2005
【分类号】:R346

【引证文献】

相关博士学位论文 前1条

1 陈欢;大豆籽粒不同发育时期基因表达谱的研究[D];吉林农业大学;2012年



本文编号:1759717

资料下载
论文发表

本文链接:https://www.wllwen.com/yixuelunwen/binglixuelunwen/1759717.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户a832a***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com