药物研发中的ADMET评估：21. 用于Caco-2细胞渗透性预测的机器学习算法的应用与工业验证。

ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction.

作者信息

Wang Dong, Jin Jieyu, Shi Guqin, Bao Jingxiao, Wang Zheng, Li Shimeng, Pan Peichen, Li Dan, Kang Yu, Hou Tingjun

机构信息

Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.

Shanghai Qilu Pharmaceutical R&D Center, 576 Libing Road, Pudong New Area District, Shanghai, 310115, China.

出版信息

J Cheminform. 2025 Jan 10;17(1):3. doi: 10.1186/s13321-025-00947-z.

DOI:10.1186/s13321-025-00947-z

PMID:39794857

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11724520/

Abstract

The Caco-2 cell model has been widely used to assess the intestinal permeability of drug candidates in vitro, owing to its morphological and functional similarity to human enterocytes. While Caco-2 cell assay is considered safe and cost-effective, it is also characterized by being time-consuming. Therefore, computational models that achieve high accuracies in predicting Caco-2 permeability are crucial for enhancing the efficiency of oral drug development. In this study, we conducted an in-depth analysis of the characteristics of an augmented Caco-2 permeability dataset, and evaluated a diverse range of machine learning algorithms in combination with different molecular representations. The results indicated that XGBoost generally provided better predictions than comparable models for the test sets. In addition, we investigated the transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets. Our findings, based on the Shanghai Qilu's in-house dataset, showed that the boosting models retained a degree of predictive efficacy when applied to industry data. Furthermore, Y-randomization test and applicability domain analysis were employed to assess the robustness and generalizability of these models. Matched Molecular Pair Analysis (MMPA) was utilized to extract chemical transformation rules. We believe that the model developed in this study could represent a reliable tool for assessing Caco-2 permeability during early-stage drug discovery and the chemical transformation rules derived here could provide insights for optimizing Caco-2 permeability.Scientific contributionA comprehensive validation of various machine learning algorithms combined with diverse molecular representations on a large dataset for predicting Caco-2 permeability was reported. The transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets was also investigated. Matched molecular pair analysis was carried out to provide reasonable suggestions for researchers to improve the Caco-2 permeability of compounds.

摘要

由于Caco-2细胞模型在形态和功能上与人肠上皮细胞相似，它已被广泛用于体外评估候选药物的肠道通透性。虽然Caco-2细胞测定被认为是安全且具有成本效益的，但它也具有耗时的特点。因此，在预测Caco-2通透性方面具有高精度的计算模型对于提高口服药物开发的效率至关重要。在本研究中，我们对一个扩充的Caco-2通透性数据集的特征进行了深入分析，并结合不同的分子表示评估了多种机器学习算法。结果表明，对于测试集，XGBoost通常比可比模型提供更好的预测。此外，我们研究了在公开可用数据上训练的机器学习模型对内部制药行业数据集的可转移性。我们基于上海齐鲁的内部数据集的研究结果表明，当应用于行业数据时，增强模型保留了一定程度的预测效力。此外，采用Y随机化测试和适用域分析来评估这些模型的稳健性和通用性。利用匹配分子对分析（MMPA）提取化学转化规则。我们相信，本研究中开发的模型可以成为早期药物发现过程中评估Caco-2通透性的可靠工具，并且这里得出的化学转化规则可以为优化Caco-2通透性提供见解。

科学贡献

报道了在一个大型数据集上对各种机器学习算法与不同分子表示相结合来预测Caco-2通透性的全面验证。还研究了在公开可用数据上训练的机器学习模型对内部制药行业数据集的可转移性。进行了匹配分子对分析，为研究人员提高化合物的Caco-2通透性提供合理建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/25c0/11724520/8fee663c9100/13321_2025_947_Fig1_HTML.jpg

相似文献

ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction.药物研发中的ADMET评估：21. 用于Caco-2细胞渗透性预测的机器学习算法的应用与工业验证。

J Cheminform. 2025 Jan 10;17(1):3. doi: 10.1186/s13321-025-00947-z.

ADME Properties Evaluation in Drug Discovery: Prediction of Caco-2 Cell Permeability Using a Combination of NSGA-II and Boosting.药物研发中的ADME性质评估：使用NSGA-II和Boosting相结合的方法预测Caco-2细胞渗透性

J Chem Inf Model. 2016 Apr 25;56(4):763-73. doi: 10.1021/acs.jcim.5b00642. Epub 2016 Apr 5.

QSPR model for Caco-2 cell permeability prediction using a combination of HQPSO and dual-RBF neural network.基于混合量子粒子群优化算法和双径向基函数神经网络的用于预测Caco-2细胞通透性的定量构效关系模型

RSC Adv. 2020 Nov 26;10(70):42938-42952. doi: 10.1039/d0ra08209k. eCollection 2020 Nov 23.

Reliable Prediction of Caco-2 Permeability by Supervised Recursive Machine Learning Approaches.通过监督递归机器学习方法对Caco-2细胞通透性进行可靠预测。

Pharmaceutics. 2022 Sep 21;14(10):1998. doi: 10.3390/pharmaceutics14101998.

Predicting Elimination of Small-Molecule Drug Half-Life in Pharmacokinetics Using Ensemble and Consensus Machine Learning Methods.基于集成和共识机器学习方法预测药代动力学中小分子药物半衰期的消除

J Chem Inf Model. 2024 Apr 22;64(8):3080-3092. doi: 10.1021/acs.jcim.3c02030. Epub 2024 Apr 2.

A Machine Learning Approach for Predicting Caco-2 Cell Permeability in Natural Products from the Biodiversity in Peru.一种基于秘鲁生物多样性预测天然产物中Caco-2细胞渗透性的机器学习方法。

Pharmaceuticals (Basel). 2024 Jun 7;17(6):750. doi: 10.3390/ph17060750.

Prediction of blood-brain barrier permeability using machine learning approaches based on various molecular representation.基于各种分子表征的机器学习方法预测血脑屏障通透性

Mol Inform. 2024 Sep;43(9):e202300327. doi: 10.1002/minf.202300327. Epub 2024 Jun 12.

Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型，对于使用可穿戴设备进行压力预测具有良好的泛化能力。

J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

Industry-scale application and evaluation of deep learning for drug target prediction.深度学习在药物靶点预测中的工业规模应用与评估

J Cheminform. 2020 Apr 19;12(1):26. doi: 10.1186/s13321-020-00428-5.

Systematic Modeling of log  Based on Ensemble Machine Learning, Group Contribution, and Matched Molecular Pair Analysis.基于集成机器学习、基团贡献和匹配分子对分析的对数系统建模。

J Chem Inf Model. 2020 Jan 27;60(1):63-76. doi: 10.1021/acs.jcim.9b00718. Epub 2020 Jan 10.

本文引用的文献

ADMETlab 3.0: an updated comprehensive online ADMET prediction platform enhanced with broader coverage, improved performance, API functionality and decision support.ADMETlab 3.0：一个更新的全面在线 ADMET 预测平台，具有更广泛的覆盖范围、更高的性能、API 功能和决策支持。

Nucleic Acids Res. 2024 Jul 5;52(W1):W422-W431. doi: 10.1093/nar/gkae236.

Chemprop: A Machine Learning Package for Chemical Property Prediction.Chemprop：一个用于化学性质预测的机器学习工具包。

J Chem Inf Model. 2024 Jan 8;64(1):9-17. doi: 10.1021/acs.jcim.3c01250. Epub 2023 Dec 26.

Prospective Validation of Machine Learning Algorithms for Absorption, Distribution, Metabolism, and Excretion Prediction: An Industrial Perspective.基于工业视角的机器学习算法在吸收、分布、代谢和排泄预测中的前瞻性验证。

J Chem Inf Model. 2023 Jun 12;63(11):3263-3274. doi: 10.1021/acs.jcim.3c00160. Epub 2023 May 22.

Trends in Molecular Properties, Bioavailability, and Permeability across the Bayer Compound Collection.拜耳化合物库中分子特性、生物利用度和渗透性的趋势

J Med Chem. 2023 Feb 23;66(4):2347-2360. doi: 10.1021/acs.jmedchem.2c01577. Epub 2023 Feb 8.

Exposing the Limitations of Molecular Machine Learning with Activity Cliffs.利用活性悬崖揭示分子机器学习的局限性。

J Chem Inf Model. 2022 Dec 12;62(23):5938-5951. doi: 10.1021/acs.jcim.2c01073. Epub 2022 Dec 1.

Perceiver CPI: a nested cross-attention network for compound-protein interaction prediction.感知器 CPI：一种用于化合物-蛋白质相互作用预测的嵌套交叉注意网络。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac731.

Reliable Prediction of Caco-2 Permeability by Supervised Recursive Machine Learning Approaches.通过监督递归机器学习方法对Caco-2细胞通透性进行可靠预测。

Pharmaceutics. 2022 Sep 21;14(10):1998. doi: 10.3390/pharmaceutics14101998.

Why 90% of clinical drug development fails and how to improve it?为什么90%的临床药物研发会失败以及如何改进？

Acta Pharm Sin B. 2022 Jul;12(7):3049-3062. doi: 10.1016/j.apsb.2022.02.002. Epub 2022 Feb 11.

Comparison of Descriptor- and Fingerprint Sets in Machine Learning Models for ADME-Tox Targets.用于ADME-Tox靶点的机器学习模型中描述符集和指纹集的比较

Front Chem. 2022 Jun 8;10:852893. doi: 10.3389/fchem.2022.852893. eCollection 2022.

RSC Adv. 2020 Nov 26;10(70):42938-42952. doi: 10.1039/d0ra08209k. eCollection 2020 Nov 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

药物研发中的ADMET评估：21. 用于Caco-2细胞渗透性预测的机器学习算法的应用与工业验证。

ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献