Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
Research Center of Biotechnology RAS, Moscow 119071, Russia.
Mol Pharm. 2022 Feb 7;19(2):674-689. doi: 10.1021/acs.molpharmaceut.1c00791. Epub 2021 Dec 29.
Tuberculosis (TB) is a major global health challenge, with approximately 1.4 million deaths per year. There is still a need to develop novel treatments for patients infected with (). There have been many large-scale phenotypic screens that have led to the identification of thousands of new compounds. Yet, there is very limited investment in TB drug discovery which points to the need for new methods to increase the efficiency of drug discovery against . We have used machine learning approaches to learn from the public data, resulting in many data sets and models with robust enrichment and hit rates leading to the discovery of new active compounds. Recently, we have curated predominantly small-molecule data and developed new machine learning classification models with 18 886 molecules at different activity cutoffs. We now describe the further validation of these Bayesian models using a library of over 1000 molecules synthesized as part of EU-funded New Medicines for TB and More Medicines for TB programs. We highlight molecular features which are enriched in these active compounds. In addition, we provide new regression and classification models that can be used for scoring compound libraries or used to design new molecules. We have also visualized these molecules in the context of known molecular targets and identified clusters in chemical property space, which may aid in future target identification efforts. Finally, we are also making these data sets publicly available, representing a significant increase to the available inhibition data in the public domain.
结核病(TB)是一个全球性的重大健康挑战,每年约有 140 万人死亡。仍然需要为感染()的患者开发新的治疗方法。已经进行了许多大规模的表型筛选,从而确定了数千种新的化合物。然而,针对结核病药物发现的投资非常有限,这表明需要新的方法来提高针对的药物发现效率。我们已经使用机器学习方法从公共数据中学习,从而产生了许多具有强大富集和命中率的数据集和模型,从而发现了新的活性化合物。最近,我们主要整理了小分子数据,并使用 18868 种不同活性截止值的分子开发了新的机器学习分类模型。现在,我们使用作为欧盟资助的结核病新药和更多结核病药物计划的一部分而合成的 1000 多种分子的文库进一步验证了这些贝叶斯模型。我们强调了在这些活性化合物中富集的分子特征。此外,我们还提供了新的回归和分类模型,可用于评分化合物文库或用于设计新分子。我们还根据已知的分子靶标对这些分子进行了可视化,并在化学性质空间中确定了聚类,这可能有助于未来的靶标识别工作。最后,我们还公开提供这些数据集,这代表着公共领域中可用的抑制数据有了显著增加。