评价各种机器学习方法在区分活性化合物方面的性能。

Evaluation of the performance of various machine learning methods on the discrimination of the active compounds.

机构信息

Pharmaceutical Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran.

出版信息

Chem Biol Drug Des. 2021 Apr;97(4):930-943. doi: 10.1111/cbdd.13819. Epub 2021 Jan 10.

DOI:10.1111/cbdd.13819

PMID:33370504

Abstract

Machine learning (ML) method performances, including deep learning (DL) on a diverse set with or without feature selection (FS), were evaluated. The superior performance of DL on small sets has not been approved previously. On the other hand, the available sets for the newly identified targets usually are limited in terms of size. It was explored whether the FS, hyperparameters search, and using ensemble model are able to improve the ML and DL performance on the small sets. The QSAR classifier models were developed using K-nearest (KN) neighbors, DL, random forest (RF), naïve Bayesian (NB) classification, support vector machine (SVM), and logistic regression (LR). Generally, the best individual performers were DL and SVM. The LR had a similar performance to the DL and SVM on the small subsets. The nested cross-validation method was able to include different feature vectors in combination with different ML methods to generate an ensemble model for the datasets with similar performance to the best performers. The general performance for the baseline NB model was Matthews correlation coefficient = 0.356, and it was improved to around 0.66 and 0.63 by NB assisted FS with subsequent SVM/DL classification and an ensemble model, respectively.

摘要

评估了机器学习 (ML) 方法的性能，包括在具有或不具有特征选择 (FS) 的多样化数据集上的深度学习 (DL)。DL 在小数据集上的优异性能此前尚未得到证实。另一方面，新确定的靶标可用的数据集在规模上通常是有限的。探索了 FS、超参数搜索以及使用集成模型是否能够提高 ML 和 DL 在小数据集上的性能。使用 K-最近邻 (KN) 邻居、DL、随机森林 (RF)、朴素贝叶斯 (NB) 分类、支持向量机 (SVM) 和逻辑回归 (LR) 开发了 QSAR 分类器模型。一般来说，最佳的个体表现者是 DL 和 SVM。在小子集上，LR 的性能与 DL 和 SVM 相似。嵌套交叉验证方法能够结合不同的 ML 方法将不同的特征向量包含在组合中，为数据集生成一个与最佳表现者性能相似的集成模型。基线 NB 模型的总体性能为 Matthews 相关系数 = 0.356，通过 NB 辅助 FS 随后进行 SVM/DL 分类和集成模型，可分别提高到约 0.66 和 0.63。

相似文献

Evaluation of the performance of various machine learning methods on the discrimination of the active compounds.评价各种机器学习方法在区分活性化合物方面的性能。

Chem Biol Drug Des. 2021 Apr;97(4):930-943. doi: 10.1111/cbdd.13819. Epub 2021 Jan 10.

Clinically Applicable Deep Learning Algorithm Using Quantitative Proteomic Data.临床适用的深度学习算法，利用定量蛋白质组学数据。

J Proteome Res. 2019 Aug 2;18(8):3195-3202. doi: 10.1021/acs.jproteome.9b00268. Epub 2019 Jul 17.

Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets.我们是否需要不同的机器学习算法来进行定量构效关系建模？对 16 种机器学习算法在 14 个定量构效关系数据集上的综合评估。

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa321.

Machine Learning Approaches to Predict Chronic Lower Back Pain in People Aged over 50 Years.机器学习方法预测 50 岁以上人群的慢性下腰痛

Medicina (Kaunas). 2021 Nov 11;57(11):1230. doi: 10.3390/medicina57111230.

Deep Learning and Machine Learning with Grid Search to Predict Later Occurrence of Breast Cancer Metastasis Using Clinical Data.利用临床数据，通过深度学习和带网格搜索的机器学习预测乳腺癌转移的后期发生情况。

J Clin Med. 2022 Sep 29;11(19):5772. doi: 10.3390/jcm11195772.

Machine learning-enabled predictive modeling to precisely identify the antimicrobial peptides.基于机器学习的预测模型，精准识别抗菌肽。

Med Biol Eng Comput. 2021 Nov;59(11-12):2397-2408. doi: 10.1007/s11517-021-02443-6. Epub 2021 Oct 11.

Classification and QSAR models of leukotriene A4 hydrolase (LTA4H) inhibitors by machine learning methods.采用机器学习方法对白细胞三烯 A4 水解酶（LTA4H）抑制剂进行分类和定量构效关系模型研究。

SAR QSAR Environ Res. 2021 May;32(5):411-431. doi: 10.1080/1062936X.2021.1910862. Epub 2021 Apr 26.

Development and validation of consensus machine learning-based models for the prediction of novel small molecules as potential anti-tubercular agents.开发和验证基于共识机器学习的模型，用于预测新型小分子作为潜在的抗结核药物。

Mol Divers. 2022 Jun;26(3):1345-1356. doi: 10.1007/s11030-021-10238-y. Epub 2021 Jun 10.

Relevance Vector Machines: Sparse Classification Methods for QSAR.相关向量机：定量构效关系的稀疏分类方法

J Chem Inf Model. 2015 Aug 24;55(8):1529-34. doi: 10.1021/acs.jcim.5b00261. Epub 2015 Jul 21.

Bioactivity Comparison across Multiple Machine Learning Algorithms Using over 5000 Datasets for Drug Discovery.利用 5000 多个数据集进行药物发现的多种机器学习算法的生物活性比较。

Mol Pharm. 2021 Jan 4;18(1):403-415. doi: 10.1021/acs.molpharmaceut.0c01013. Epub 2020 Dec 16.

引用本文的文献

Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation.机器学习算法在预测男男性行为者中 HIV 感染中的应用：模型开发和验证。

Front Public Health. 2022 Aug 25;10:967681. doi: 10.3389/fpubh.2022.967681. eCollection 2022.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

评价各种机器学习方法在区分活性化合物方面的性能。

Evaluation of the performance of various machine learning methods on the discrimination of the active compounds.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献