Aleksić Stevan, Seeliger Daniel, Brown J B
Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, 88397, Biberach, Germany.
Mol Inform. 2022 Feb;41(2):e2100113. doi: 10.1002/minf.202100113. Epub 2021 Sep 2.
Computational methods assisting drug discovery and development are routine in the pharmaceutical industry. Digital recording of ADMET assays has provided a rich source of data for development of predictive models. Despite the accumulation of data and the public availability of advanced modeling algorithms, the utility of prediction in ADMET research is not clear. Here, we present a critical evaluation of the relationships between data volume, modeling algorithm, chemical representation and grouping, and temporal aspect (time sequence of assays) using an in-house ADMET database. We find no large difference in prediction algorithms nor any systemic and substantial gain from increasingly large datasets. Temporal-based data enlargement led to performance improvement in only in a limited number of assays, and with fractional improvement at best. Assays that are well-, intermediately-, or poorly-suited for ADMET predictions and reasons for such behavior are systematically identified, generating realistic expectations for areas in which computational models can be used to guide decision making in molecular design and development.
辅助药物发现和开发的计算方法在制药行业已很常见。ADMET 试验的数字记录为预测模型的开发提供了丰富的数据来源。尽管数据不断积累且先进的建模算法已公开可用,但预测在 ADMET 研究中的效用仍不明确。在此,我们使用内部 ADMET 数据库对数据量、建模算法、化学表示与分组以及时间因素(试验的时间顺序)之间的关系进行了批判性评估。我们发现预测算法没有太大差异,而且越来越大的数据集也没有带来任何系统性的显著收益。基于时间的数据扩充仅在有限数量的试验中导致性能有所提升,且提升幅度至多为小数。系统地确定了非常适合、中等适合或不太适合 ADMET 预测的试验以及出现这种情况的原因,从而对计算模型可用于指导分子设计和开发决策的领域产生了现实的期望。