用户名: 密码: 验证码:
基于机器学习的抗纤维化中药化合物筛选研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Screening anti-fibrosis Chinese medicinal compounds based on machine learning
  • 作者:王曦廷 ; 李彧 ; 张澜 ; 刘梦 ; 李城 ; 杨秋实 ; 杭晓屹 ; 刘祎
  • 英文作者:Wang Xiting;Li Yu;Zhang Lan;Liu Meng;Li Cheng;Yang Qiushi;Hang Xiaoyi;Liu Yi;School of Traditional Chinese Medicine,Beijing University of Chinese Medicine;
  • 关键词:器官纤维化 ; 机器学习 ; 分子指纹 ; 中药化合物筛选
  • 英文关键词:organ fibrosis;;machine learning;;molecular fingerprinting;;Chinese medicinal compound screening
  • 中文刊名:JZYB
  • 英文刊名:Journal of Beijing University of Traditional Chinese Medicine
  • 机构:北京中医药大学中医学院;
  • 出版日期:2019-01-30
  • 出版单位:北京中医药大学学报
  • 年:2019
  • 期:v.42
  • 基金:国家自然科学基金面上资助项目(No.81573716)~~
  • 语种:中文;
  • 页:JZYB201901008
  • 页数:7
  • CN:01
  • ISSN:11-3574/R
  • 分类号:34-40
摘要
目的构建新型抗纤维化中药化合物虚拟筛选预测模型,并对模型的预测性能进行验证。方法通过对比使用随机森林与梯度提升决策树算法,实现化合物分子指纹的降维与特征优化。构建"特征优化—机器学习"的混合模型,将优化的特征作为输入分别送入逻辑回归、人工神经网络机器学习算法进行训练。使用准确率、召回率、F1值对不同组合的模型进行性能评价。根据模型性能结果确定抗纤维化中药化合物虚拟筛选预测模型。随后,对比此模型和分子对接模型对中药化合物的抗纤维化活性预测结果,进一步验证该模型的预测性能。结果随机森林模型准确率0.76,召回率0.75,F1值0.74,曲线下面积(AUC)值0.818;梯度提升决策树模型准确率0.76,召回率0.74,F1值0.72,AUC值0.829;人工神经网络模型准确率0.75,召回率0.75,F1值0.75,AUC值0.802;随机森林+逻辑回归模型准确率0.77,召回率0.76,F1值0.75,AUC值0.840;随机森林+人工神经网络模型准确率0.74,召回率0.84,F1值0.79,AUC值0.850;梯度提升决策树+逻辑回归模型准确率0.80,召回率0.80,F1值0.79,AUC值0.872;梯度提升决策树+人工神经网络模型准确率0.73,召回率0.91,F1值0.81,AUC值0.837。中药化合物姜黄素、甘草酸、羟基红花黄色素A、大黄素、绞股蓝皀苷分子对接活性结果与本模型预测结果一致。结论梯度提升决策树+逻辑回归模型表现较其他模型准确。通过对比该模型与分子对接模型,进一步确认了该模型在中药化合物预测方面的稳定性;且本模型具有高通量筛选的特性,可以弥补分子对接在筛选化合物效率方面的不足,可作为抗纤维化中药化合物虚拟筛选预测的新方法。
        Objective To establish a new-type virtual screening predictive model of Chinese medicinal compounds with anti-fibrosis effects,and to verify the predictive performance of the model.Methods The dimension reduction and characteristic optimization of molecular fingerprints were implemented by using random forest(RF)algorithm and gradient boosting decision tree(GBDT)algorithm.A hybrid model of characteristic optimization-machine learning was established,and optimized characteristics were input into logistic regression(LR)and machine learning algorithm of artificial neural network(ANN)for training.Precision,recall rate and F1 value were used for reviewing the performances of various model combinations.The virtual screening predictive model of Chinese medicinal compounds with anti-fibrosis effect was determined according to results of model performance reviewing.The predictive results of anti-fibrosis activity of Chinese medicinal compounds were compared between the virtual screening predictive model and molecular docking model for further verifying the predictive efficiency of the virtual screening predictive model.Results The precision of RF model was 0.76,recall rate was 0.75 and F1 value was 0.74(AUC=0.818).The precision that of GBDT model was 0.76,recall rate was 0.74 and F1 value was 0.72(AUC=0.829).The precision of ANN model was 0.75,racall rate was 0.75 and F1 value was 0.75(AUC=0.802),and that of model of RF+LR was 0.77,recall rate was 0.76 and F1 value was 0.75(AUC=0.840).The precision of model of RF+LR was 0.74,recall rate was 0.84 and F1 value was 0.79(AUC=0.850),and that of model of GBDT+LR was 0.80,recall rate was 0.80 and F1 value was 0.79(AUC=0.872).The precision of model of GBDT+ANN was 0.73,recall rate was 0.91 and F1 value was 0.81(AUC=0.837).The results of molecular docking activities of Chinese medicinal compounds including curcumin,glycyrrhizic acid,hydro-xysafflor yellow A,emodine and gypenoside were accordance with the predictive results of the virtual screening predictive model.Conclusion The model based on RF+LR is better than the models established based on other methods.The virtual screening predictive model has good performance in prediction of Chinese medicinal compounds through comparing with molecular docking model.The method has feature of highthroughput screening and can make up the shortage of compound screening efficiency in molecular docking.It provides a new way for virtual screening prediction of Chinese medicinal compounds with anti-fibrosis effects.
引文
[1]Sliwoski G,Kothiwale S,Meiler J,et al.Computational methods in drug discovery[J].Pharmacological Reviews,2013,61(2):67-75.
    [2]Prockop DJ.Inflammation,fibrosis,and modulation of the process by mesenchymal stem/stromal cells[J].Matrix Biology,2016,51:7-13.
    [3]Deo RC.Machine learning in medicine[J].Circulation,2015,132(20):1920-1930.
    [4]Riniker S,Wang Y,Jenkins JL,et al.Using information from historical high-throughput screens to predict active compounds[J].Journal of Chemical Information&Modeling,2014,54(7):1880-1891.
    [5]Kim E,Nam H.Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints[J/OL].BMC Bioinformatics,2017,18(Suppl 7):227[2017-05-30].https://www.ncbi.nlm.nih.gov/pubmed/?term=Prediction+models+for+drug-induced+hepatotoxicity+by+using+weighted+molecular+fingerprints.
    [6]Abraham A,Pedregosa F,Eickenberg M,et al.Machine learning for neuroimaging with scikit-learn[J].Frontiers in Neuroinformatics,2014,8(14):1-10.
    [7]Hiromasa K,Funatsu K.Applicability domain based on ensemble learning in classification and regression analyses[J].Journal of Chemical Information&Modeling,2014,54(9):2469-2482.
    [8]Peixoto LA,Bhering LL,Cruz CD.Artificial neural networks reveal efficiency in genetic value prediction[J].Genetics&Molecular Research,2015,14(2):6796-6807.
    [9]Schneider N,Sayle RA,Landrum GA.Get your atoms in order-An open-source implementation of a novel and robust molecular canonicalization algorithm[J].Journal of Chemical Information&Modeling,2015,55(10):2111-2120.
    [10]Pagadala NS,Syed K,Tuszynski J.Software for molecular docking:a review[J].Biophysical Reviews,2017,9(2):91-102.
    [11]Radaev S,Zou ZC,Huang T,et al.Ternary complex of transforming growth factor-beta1reveals isoformspecific ligand recognition and receptor recruitment in the superfamily[J].Journal of Biological Chemistry,2010,285(19):14806-14814.
    [12]Muegge I,Mukherjee P.An overview of molecular fingerprint similarity search in virtual screening[J].Expert Opinion on Drug Discovery,2015:137-148.
    [13]Heikamp K,Bajorath J.How do 2Dfingerprints detect structurally diverse active compounds?revealing compound subset-specific fingerprint features through systematic selection[J].Journal of Chemical Information&Modeling,2011,51(9):2254-2265.
    [14]Jamal S,Goyal S,Shanker A,et al.Predicting neurological adverse drug reactions based on biological,chemical and phenotypic properties of drugs using machine learning models[J/OL].Sci Rep,2017,7(1):872[2017-04-13].https://www.nature.com/articles/s41598-017-00908-z.
    [15]Wang TY,Lu M,Du QQ,et al.An integrated anti-arrhythmic target network of a Chinese medicine compound,Wenxin Keli,revealed by combined machine learning and molecular pathway analysis[J].Molecular Biosystems,2017,13(5):1018-1030.
    [16]李贡宇,乔连生,陈茜,等.基于计算机辅助水解技术的芝麻寡肽体内转运性质预测[J].北京中医药大学学报,2018,41(3):222-226.Li GY,Qiao LS,Chen X,et al.Prediction on transport properties in vivo of oligopeptides peptide in Zhima based on computer-assisted proteolysis[J].Journal of Beijing University of Traditional Chinese Medicine,2018,41(3):222-226.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700