Shah Pranav, Siramshetty Vishal B, Mathé Ewy, Xu Xin
National Center for Advancing Translational Sciences (NCATS), 9808 Medical Center Drive, Rockville, MD 20850, USA.
Pharmaceutics. 2024 Sep 27;16(10):1257. doi: 10.3390/pharmaceutics16101257.
Pharmacokinetic issues were the leading cause of drug attrition, accounting for approximately 40% of all cases before the turn of the century. To this end, several high-throughput in vitro assays like microsomal stability have been developed to evaluate the pharmacokinetic profiles of compounds in the early stages of drug discovery. At NCATS, a single-point rat liver microsomal (RLM) stability assay is used as a Tier I assay, while human liver microsomal (HLM) stability is employed as a Tier II assay. We experimentally screened and collected data on over 30,000 compounds for RLM stability and over 7000 compounds for HLM stability. Although HLM stability screening provides valuable insights, the increasing number of hits generated, along with the time- and resource-intensive nature of the assay, highlights the need for alternative strategies. One promising approach is leveraging in silico models trained on these experimental datasets. We describe the development of an HLM stability prediction model using our in-house HLM stability dataset. Employing both classical machine learning methods and advanced techniques, such as neural networks, we achieved model accuracies exceeding 80%. Moreover, we validated our model using external test sets and found that our models are comparable to some of the best models in literature. Additionally, the strong correlation observed between our RLM and HLM data was further reinforced by the fact that our HLM model performance improved when using RLM stability predictions as an input descriptor. The best model along with a subset of our dataset (PubChem AID: 1963597) has been made publicly accessible on the ADME@NCATS website for the benefit of the greater drug discovery community. To the best of our knowledge, it is the largest open-source model of its kind and the first to leverage cross-species data.
药代动力学问题是药物研发失败的主要原因,在世纪之交前约占所有案例的40%。为此,已经开发了几种高通量体外试验,如微粒体稳定性试验,以在药物发现的早期阶段评估化合物的药代动力学特征。在国家转化科学推进中心(NCATS),单点大鼠肝微粒体(RLM)稳定性试验用作一级试验,而人肝微粒体(HLM)稳定性试验用作二级试验。我们通过实验筛选并收集了超过30000种化合物的RLM稳定性数据以及超过7000种化合物的HLM稳定性数据。尽管HLM稳定性筛选提供了有价值的见解,但生成的命中数不断增加,以及该试验在时间和资源方面的密集性质,凸显了替代策略的必要性。一种有前景的方法是利用基于这些实验数据集训练的计算机模拟模型。我们描述了使用我们内部的HLM稳定性数据集开发HLM稳定性预测模型的过程。通过使用经典机器学习方法和先进技术,如神经网络,我们实现了超过80%的模型准确率。此外,我们使用外部测试集对模型进行了验证,发现我们的模型与文献中一些最佳模型相当。此外,当使用RLM稳定性预测作为输入描述符时,我们的HLM模型性能有所提高,这进一步加强了我们观察到的RLM和HLM数据之间的强相关性。最佳模型及其数据集的一个子集(PubChem AID: 1963597)已在ADME@NCATS网站上公开提供,以造福更大的药物发现社区。据我们所知,这是同类中最大的开源模型,也是第一个利用跨物种数据的模型。