人类精子非编码氨基酸多样性的研究
发布时间:2018-03-16 06:11
本文选题:蛋白质组 切入点:非编码氨基酸 出处:《山东大学》2017年硕士论文 论文类型:学位论文
【摘要】:蛋白质通常是由基因组的编码序列翻译确定的。然而,因为翻译后修饰,氨基酸替换等原因,它们的氨基酸残基很少直接以基因组的方式确定,实际情况下的氨基酸残基往往会发生改变,从而改变蛋白结构和影响蛋白功能。但是目前生物体中的氨基酸残基很少直接以蛋白质组学的方式确定,主要是因为与编码的氨基酸不同的氨基酸残基通常会被普通搜索算法忽略,其次是因为蛋白质测序技术通常取决于理论上翻译的蛋白质数据库。然而,通过假设在肽断序列中存在一个或多个未定义的非编码氨基酸残基,成为解决那些无法匹配肽谱的突破点。在早期的方法中,部分肽段序列来源于不匹配的光谱,可以用作标签来搜索理论上基因组翻译的蛋白质数据库,搜索结果就会出现意想不到的翻译后修饰和氨基酸取代。后来用非限制性搜索算法来识别非编码氨基酸残基,却不知道它们是否存在。mass-tolerant方法最初用于通过允许前体与其片段之间的质量差异来检测已知的修饰,近来改进了该方法,通过允许宽泛的mass-tolerant来匹配含有宽范围质量差或未定义的修饰的肽段序列,找到许多修饰。但是这些方法的主要问题仍然是较高的假阳性,较低的灵敏度和漫长的搜索时间。在这里我们系统研究了在人类精子细胞中所有可能的氨基酸残基,它们的相对分子质量不同于在基因组序列中编码的氨基酸,称为非编码氨基酸(ncAA)。通过测量编码氨基酸和实际蛋白质残基之间的质量差,发现超过一百万个存在非零质量差的氨基酸,即侧链发生改变的氨基酸。然后根据这些质量差做高斯混合分布分析以及迭代回归分析,从而确定了424种高可信度的聚集高斯簇,通过机器学习算法建立决策树确定了849种高度可信的ncAAs,分布在35,274个蛋白质位点上。其中发现180种质量差聚类显示具有从未报告过的氨基酸侧链结构;105种ncAAs匹配到氨基酸替换的类型,其中40种通过转录组测序得以确认。此外,根据ANOVA分析结果,发现有些ncAAs在正常人群中存在特异性分布,暗示着这些ncAAs可能与人群差异性有关。还有些ncAAs在重度少弱精患者和正常人群中呈差异性分布,暗示着这些ncAAs与患病机理有关,其中有些磷酸化位点已经被之前的研究所报道。我们的研究表明ncAAs广泛存在于精子细胞中,主要是因为核苷酸多态性,翻译后修改,以及一些未知的机制,这些对疾病的诊断和药物靶向治疗存在重要意义。
[Abstract]:Proteins are usually determined by the translation of the coding sequence of the genome. However, because of post-translational modification, amino acid substitution and other reasons, their amino acid residues are rarely determined directly by the genome. In practice, amino acid residues often change, thus changing protein structure and affecting protein function. However, at present, amino acid residues in organisms are rarely determined directly by proteomics. Mainly because amino acid residues that are different from the amino acids encoded are often ignored by common search algorithms, followed by protein sequencing techniques that are generally dependent on the protein database that is theoretically translated. By assuming that there are one or more undefined non-coding amino acid residues in the peptide sequence, it becomes a breakthrough point to solve the problem of unmatched peptide spectrum. In the early methods, some of the peptide fragment sequences were derived from mismatched spectra. Can be used as a tag to search a protein database for theoretical genomic translation, resulting in unexpected posttranslational modifications and amino acid substitutions. Then an unconstrained search algorithm is used to identify non-encoded amino acid residues. Not knowing whether they exist or not, the. Mass-tolerant method, which was originally used to detect known modifications by allowing quality differences between precursors and their fragments, has recently been improved. Many modifications are found by allowing broad mass-tolerant to match peptide sequences containing a wide range of poor or undefined modifications. But the main problem with these methods is still high false positivity. Low sensitivity and long search time. Here we systematically studied all the possible amino acid residues in human sperm cells, whose relative molecular weights differ from those encoded in genomic sequences. By measuring the mass difference between the encoded amino acids and the actual protein residues, more than one million amino acids with non-zero mass differences were found. According to these mass differences, Gao Si mixed distribution analysis and iterative regression analysis were made to determine 424 kinds of high reliability aggregating Gao Si clusters. A decision tree of 849 highly trusted ncAAss was established by machine learning algorithm, which was distributed on 35,274 protein sites. Among them, 180 mass difference clusters were found to have unreported amino acid side chain structure (ncAAs) matching. Type of amino acid replacement, Forty of them were identified by transcriptome sequencing. In addition, according to the results of ANOVA analysis, some ncAAs were found to have specific distribution in normal population. This suggests that these ncAAs may be related to population differences, and that some ncAAs are distributed differently in patients with severe oligozoospermia and in normal people, suggesting that these ncAAs may be related to the pathogenesis of the disease. Some of these phosphorylation sites have been reported in previous studies. Our studies have shown that ncAAs is widespread in sperm cells, mainly due to nucleotide polymorphisms, post-translational modifications, and unknown mechanisms. These are important for the diagnosis of disease and drug targeted therapy.
【学位授予单位】:山东大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:R321.1
【相似文献】
相关硕士学位论文 前2条
1 张晨;UAA编码氨基酸表达体系的构建[D];吉林大学;2017年
2 陈新骏;人类精子非编码氨基酸多样性的研究[D];山东大学;2017年
,本文编号:1618651
本文链接:https://www.wllwen.com/yixuelunwen/jichuyixue/1618651.html