Suppr超能文献

利用机器学习方法预测针对 SARS-CoV-2 的候选药物化合物的生物活性。

Bio-activity prediction of drug candidate compounds targeting SARS-Cov-2 using machine learning approaches.

机构信息

Department of Computer Science and Engineering, Brac University, Dhaka, Bangladesh.

Department of Computer Science and Engineering, University of California, Riverside, California, United States of America.

出版信息

PLoS One. 2023 Sep 5;18(9):e0288053. doi: 10.1371/journal.pone.0288053. eCollection 2023.

Abstract

The SARS-CoV-2 3CLpro protein is one of the key therapeutic targets of interest for COVID-19 due to its critical role in viral replication, various high-quality protein crystal structures, and as a basis for computationally screening for compounds with improved inhibitory activity, bioavailability, and ADMETox properties. The ChEMBL and PubChem database contains experimental data from screening small molecules against SARS-CoV-2 3CLpro, which expands the opportunity to learn the pattern and design a computational model that can predict the potency of any drug compound against coronavirus before in-vitro and in-vivo testing. In this study, Utilizing several descriptors, we evaluated 27 machine learning classifiers. We also developed a neural network model that can correctly identify bioactive and inactive chemicals with 91% accuracy, on CheMBL data and 93% accuracy on combined data on both CheMBL and Pubchem. The F1-score for inactive and active compounds was 93% and 94%, respectively. SHAP (SHapley Additive exPlanations) on XGB classifier to find important fingerprints from the PaDEL descriptors for this task. The results indicated that the PaDEL descriptors were effective in predicting bioactivity, the proposed neural network design was efficient, and the Explanatory factor through SHAP correctly identified the important fingertips. In addition, we validated the effectiveness of our proposed model using a large dataset encompassing over 100,000 molecules. This research employed various molecular descriptors to discover the optimal one for this task. To evaluate the effectiveness of these possible medications against SARS-CoV-2, more in-vitro and in-vivo research is required.

摘要

SARS-CoV-2 3CLpro 蛋白是 COVID-19 的一个重要治疗靶点,因为它在病毒复制中起着关键作用,有各种高质量的蛋白质晶体结构,并且可以作为计算筛选具有更好抑制活性、生物利用度和 ADMETox 性质的化合物的基础。ChEMBL 和 PubChem 数据库包含了针对 SARS-CoV-2 3CLpro 筛选小分子的实验数据,这为学习模式和设计计算模型提供了机会,可以在体外和体内测试之前预测任何药物化合物对冠状病毒的效力。在这项研究中,我们利用了几个描述符来评估 27 个机器学习分类器。我们还开发了一个神经网络模型,可以在 CheMBL 数据上以 91%的准确率正确识别生物活性和非生物活性的化学物质,在 CheMBL 和 Pubchem 上的组合数据上以 93%的准确率正确识别。对于非活性和活性化合物,F1 得分为 93%和 94%。在 XGB 分类器上使用 SHAP(Shapley Additive exPlanations)来寻找来自 PaDEL 描述符的重要指纹。结果表明,PaDEL 描述符在预测生物活性方面是有效的,所提出的神经网络设计是有效的,通过 SHAP 的解释因子正确地识别了重要的指尖。此外,我们使用包含超过 100,000 个分子的大型数据集来验证我们提出的模型的有效性。本研究采用了各种分子描述符来发现最适合这项任务的描述符。为了评估这些可能的药物对 SARS-CoV-2 的有效性,需要更多的体外和体内研究。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验