National Center for Toxicological Research, U.S. Food & Drug Administration, Jefferson, AR 72079, USA.
Exp Biol Med (Maywood). 2023 Nov;248(21):1927-1936. doi: 10.1177/15353702231209413. Epub 2023 Nov 24.
The coronavirus disease 2019 (COVID-19) global pandemic resulted in millions of people becoming infected with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and close to seven million deaths worldwide. It is essential to further explore and design effective COVID-19 treatment drugs that target the main protease of SARS-CoV-2, a major target for COVID-19 drugs. In this study, machine learning was applied for predicting the SARS-CoV-2 main protease binding of Food and Drug Administration (FDA)-approved drugs to assist in the identification of potential repurposing candidates for COVID-19 treatment. Ligands bound to the SARS-CoV-2 main protease in the Protein Data Bank and compounds experimentally tested in SARS-CoV-2 main protease binding assays in the literature were curated. These chemicals were divided into training (516 chemicals) and testing (360 chemicals) data sets. To identify SARS-CoV-2 main protease binders as potential candidates for repurposing to treat COVID-19, 1188 FDA-approved drugs from the Liver Toxicity Knowledge Base were obtained. A random forest algorithm was used for constructing predictive models based on molecular descriptors calculated using Mold2 software. Model performance was evaluated using 100 iterations of fivefold cross-validations which resulted in 78.8% balanced accuracy. The random forest model that was constructed from the whole training dataset was used to predict SARS-CoV-2 main protease binding on the testing set and the FDA-approved drugs. Model applicability domain and prediction confidence on drugs predicted as the main protease binders discovered 10 FDA-approved drugs as potential candidates for repurposing to treat COVID-19. Our results demonstrate that machine learning is an efficient method for drug repurposing and, thus, may accelerate drug development targeting SARS-CoV-2.
2019 年冠状病毒病(COVID-19)全球大流行导致数百万人感染严重急性呼吸综合征冠状病毒 2(SARS-CoV-2)病毒,全球近 700 万人死亡。进一步探索和设计针对 SARS-CoV-2 主要蛋白酶的有效 COVID-19 治疗药物至关重要,SARS-CoV-2 主要蛋白酶是 COVID-19 药物的主要靶点。在这项研究中,应用机器学习预测已获美国食品和药物管理局(FDA)批准的药物与 SARS-CoV-2 主要蛋白酶的结合,以协助识别 COVID-19 治疗的潜在再利用候选药物。从蛋白质数据库中筛选与 SARS-CoV-2 主要蛋白酶结合的配体和文献中实验测试的 SARS-CoV-2 主要蛋白酶结合化合物。这些化合物被分为训练(516 种化合物)和测试(360 种化合物)数据集。为了识别 SARS-CoV-2 主要蛋白酶结合物作为治疗 COVID-19 的再利用潜在候选物,从肝毒性知识库中获得了 1188 种已获 FDA 批准的药物。使用 Mold2 软件计算的分子描述符,应用随机森林算法构建预测模型。通过 100 次五重交叉验证的迭代评估模型性能,得到 78.8%的平衡准确性。从整个训练数据集构建的随机森林模型用于预测测试集中 SARS-CoV-2 主要蛋白酶的结合和 FDA 批准的药物。模型适用性域和对预测为主要蛋白酶结合物的药物的预测置信度发现 10 种已获 FDA 批准的药物可作为治疗 COVID-19 的再利用潜在候选物。我们的结果表明,机器学习是一种有效的药物再利用方法,因此可能会加速针对 SARS-CoV-2 的药物开发。