大数据环境下化合物类药性与活性预测研究

发布时间：2018-01-18 12:28

本文关键词：大数据环境下化合物类药性与活性预测研究　出处：《新疆大学》2016年硕士论文　论文类型：学位论文

【摘要】：在药物研发过程中,需要通过从海量化合物数据库中筛选出质量较高的药物先导物来实现药物设计成功效率的增加,因此,类药性的概念应运而生,药化学家们通过这一概念对分子结构特征和性质进行研究,并总结出了类药性预测指标。另外,化合物的构造活性关系研究也是药物设计的重要方式之一,在发现和研究新的药物的过程中,研究化合物活性与研究化合物类药性同样重要。在大多传统的化合物活性研究中,通过动物活体测验和检测方式对化合物的药物活性进行测定,在海量化合物数据环境下无疑要耗费大量时间及成本。而现代化合物活性研究对未知化合物的活性预测是通过使用数学方法建立定量构效关系模型来实现的。随着计算机数据挖掘技术的不断发展,机器学习成为了计算机科学领域的一个活跃的研究方法,科学家们应用机器学习方法提高药物活性预测效率。然而,大多数已有研究方法使用的都是浅层机器学习算法,面对已知样本和计算单元受到限制的情况下,其对复杂问题的泛化能力难以满足要求,无法学习更有用的特征。并且这些研究样本数据量普遍较小,准确率较低,在当今海量化合物数据环境下实用性较差。本文以海量化合物数据为对象,结合深度学习方法,建立了海量化合物类药性及药物活性预测模型,具体内容包括以下两个部分:(1)海量化合物类药性预测模型。第一部分结合化合物数据结构,为实现海量化合物类药性的快速预测建立了分布式计算模型,并在此模型基础上,根据类药性评价规则快速高效地筛选出具有类药性的化合物。模型使用了基于分治策略的分段哈希算法,并设计了连续属性离散化方法,针对不适于用哈希检索的连续数值型数据进行离散化处理。(2)海量化合物药物活性预测模型。第二部分对浅层机器学习方法和深度机器学习方法进行介绍,并介绍它们的特点。然后以分子描述符为特征,分别使用不同的学习方法建立了海量化合物的药物活性预测模型并对两种学习方法进行对比。实验结果表明,本文采用的深度学习模型适用于海量化合物类药性及药物活性预测,可快速有效地筛选类药化合物并预测其药物活性。模型具有稳定的可扩展性和高效性,其正确性也得到了体现。
[Abstract]:In the process of drug development, it is necessary to select the high-quality drug precursors from the mass database of compounds to increase the efficiency of drug design. Therefore, the concept of drug-like came into being. Pharmacologists have studied the molecular structure and properties through this concept, and summarized the predictors of drug-like properties. In addition, the study of structure-activity relationship of compounds is also one of the important ways of drug design. In the discovery and research of new drugs, the study of the activity of compounds is just as important as the study of the properties of compounds, in most traditional studies of the activity of compounds. The drug activity of the compound was determined by animal living test and detection. In the environment of mass compound data, there is no doubt that it will cost a lot of time and cost, but the activity prediction of unknown compounds in modern compound activity research is realized by using mathematical method to establish quantitative structure-activity relationship model. With the development of computer data mining technology. Machine learning has become an active research method in the field of computer science. Scientists use machine learning to improve the efficiency of drug activity prediction. Most of the existing research methods use shallow machine learning algorithms. When the known samples and computing units are limited, its generalization ability to complex problems is difficult to meet the requirements. We can not learn more useful features. And these research sample data is generally small, the accuracy is low, and the practicability is poor in today's mass compound data environment. This paper takes the massive compound data as the object. Combined with the method of deep learning, the prediction model of drug properties and drug activity of massive compounds was established. The main contents are as follows: 1) the following two parts: 1) the prediction model of mass chemical properties. The first part establishes a distributed computing model for fast prediction of large amounts of chemical compounds by combining the data structure of compounds. On the basis of this model, we quickly and efficiently screen out the compounds with similar properties according to the evaluation rules of similar properties. The model uses a partition-and-conquer strategy based subsection hash algorithm, and designs a continuous attribute discretization method. A mass drug activity prediction model for continuous numerical data which is not suitable for hash retrieval is presented. In the second part, the shallow machine learning method and depth machine learning method are introduced. Then using different learning methods to establish the drug activity prediction model of massive compounds and compare the two learning methods. The depth learning model used in this paper is suitable for the prediction of drug properties and drug activity of a large number of compounds. It can quickly and effectively screen and predict the drug activity of drug like compounds. The model has stable expansibility and high efficiency. Its correctness has also been reflected.
【学位授予单位】：新疆大学
【学位级别】：硕士
【学位授予年份】：2016
【分类号】：TP311.13;R91

【相似文献】