当前位置:主页 > 科技论文 > 自动化论文 >

基于SVM的化合物致突变性分类预测系统的研究与实现

发布时间:2018-12-25 15:09
【摘要】:随着科技的进步,越来越多的药物被研发出来对抗各种疾病,而药物研发的过程需要花费大量的物力、人力,且研发周期长。药物研发的整个过程中需要考虑该药物的ADMET(吸收,分布,代谢,排泄,毒性)五个特性,其中药物毒性中的致突变性与癌症密切相关。在药物研发的最后阶段动物人体试验阶段,会对药物的对人体的致突变型进行测试,而在这个阶段往往有很多药物的致突变性实验结果会因为对动物或者人体损害太大而被放弃研发,这样便浪费了之前各个阶段的资金投入。近年来,计算机技术中的模式识别技术迅速发展,被应用到社会上的各个领域,生物信息和药物研发也是模式识别的一个重要研究方向。本系统的主要功能是利用机器学习算法对化合物的致突变性进行预测分类,通过分类模型对化合物致突变性相关的化合物特征进行分析。系统提供大量的化合物以及化合物的特征属性作为分类模型的训练集,其中包括各研究机构对该化合物致突变性研究的结果。系统为用户提供化合物特征计算、特征选择、数据清洗、分类模型建立、化合物致突变性预测、结果分析、结果文件保存等功能。研究人员可以根据预测结果来分析影响化合物致突变性的关键特征。系统使用Java语言开发,使用的Spring MVC框架进行系统架构,使用MySQL数据库进行化合物特征和个人信息等数据的存储,实现了数据处理模块、预测分类模块、结果分析模块、系统管理模块和个人信息模块。在数据处理模块中,系统根据化合物的SMILES序列计算出化合物的1446纬化合物特征描述符并对特征数据进行缺失值处理和规范化等操作,再使用信息增益、CFS和Relief等特征选择算法来对特征进行降维处理。在预测分类模块中,系统采用支持向量机算法模型再使用Adaboost算法对支持向量机模型进行迭代提升,从而提高系统的预测准度。经过各种交叉验证和独立测试集验证,系统能够较为准确的预测化合物的致突变性,精确度达到83.5%。在功能上和性能上都能满足用户的需求,达到预期研究效果。
[Abstract]:With the development of science and technology, more and more drugs are developed to fight various diseases, and the process of drug development takes a lot of material resources, manpower, and long research and development cycle. Five characteristics of ADMET (absorption, distribution, metabolism, excretion and toxicity) should be considered in the whole process of drug development. The mutagenicity of drug toxicity is closely related to cancer. In the final phase of drug development in animal human trials, the mutagenicity of drugs in humans will be tested. At this stage, many drug mutagenicity experiments will be abandoned because of too much damage to animals or human beings, thus wasting the previous stages of investment. In recent years, the pattern recognition technology in computer technology has developed rapidly and been applied to all fields of society. Biological information and drug development are also an important research direction of pattern recognition. The main function of this system is to predict and classify the mutagenicity of compounds by using machine learning algorithm, and to analyze the characteristics of compounds related to mutagenicity by classification model. The system provides a large number of compounds and their characteristic attributes as training sets for classification models, including the results of the studies on the mutagenicity of the compounds by various research institutions. The system provides users with functions such as compound feature calculation, feature selection, data cleaning, classification model building, compound mutagenicity prediction, result analysis, result file preservation, and so on. The researchers can use the predicted results to analyze key characteristics that affect mutagenicity of compounds. The system uses Java language to develop, uses the Spring MVC frame to carry on the system structure, uses the MySQL database to carry on the compound characteristic and the personal information and so on data storage, has realized the data processing module, the forecast classification module, the result analysis module, System management module and personal information module. In the data processing module, according to the SMILES sequence of the compound, the system calculates the characteristic descriptor of the compound in 1446 latitudes, processes the missing value and normalizes the characteristic data, and then uses the information gain. Feature selection algorithms such as CFS and Relief are used to reduce the dimension of features. In the prediction classification module, the support vector machine (SVM) algorithm model is adopted and the Adaboost algorithm is used to iterate the SVM model to improve the prediction accuracy of the system. After a variety of cross-validation and independent test set verification, the system can accurately predict the mutagenicity of compounds, with an accuracy of 83.555. In function and performance can meet the needs of users, to achieve the desired results.
【学位授予单位】:辽宁大学
【学位级别】:硕士
【学位授予年份】:2017
【分类号】:TQ460;TP181

【参考文献】

相关期刊论文 前10条

1 晏皓鸾;黄景碧;;学习者情感挖掘:一个重要的教育技术研究领域[J];软件导刊(教育技术);2014年01期

2 计智伟;胡珉;尹建新;;特征选择算法综述[J];电子设计工程;2011年09期

3 刘庆和;梁正友;;一种基于信息增益的特征优化选择方法[J];计算机工程与应用;2011年12期

4 丁世飞;齐丙娟;谭红艳;;支持向量机理论与算法研究综述[J];电子科技大学学报;2011年01期

5 朱树先;张仁杰;;支持向量机核函数选择的研究[J];科学技术与工程;2008年16期

6 姚勇;赵辉;刘志镜;;一种非线性支持向量机决策树多值分类器[J];西安电子科技大学学报;2007年06期

7 廖明阳;吴纯启;;药物毒理学研究的发展现状与趋势[J];毒理学杂志;2007年05期

8 毛勇;周晓波;夏铮;尹征;孙优贤;;特征选择算法研究综述[J];模式识别与人工智能;2007年02期

9 李琳;张晓龙;;基于RBF核的SVM学习算法的优化计算[J];计算机工程与应用;2006年29期

10 王兴玲,李占斌;基于网格搜索的支持向量机核函数参数的确定[J];中国海洋大学学报(自然科学版);2005年05期

相关硕士学位论文 前1条

1 李晓岚;基于Relief特征选择算法的研究与应用[D];大连理工大学;2013年



本文编号:2391291

资料下载
论文发表

本文链接:https://www.wllwen.com/kejilunwen/zidonghuakongzhilunwen/2391291.html


Copyright(c)文论论文网All Rights Reserved | 网站地图 |

版权申明:资料由用户4a1b8***提供,本站仅收录摘要或目录,作者需要删除请E-mail邮箱bigeng88@qq.com