探索基于图的模型以预测抗三阴性乳腺癌的活性化合物。

Exploring graph-based models for predicting active compounds against triple-negative breast cancer.

作者信息

Mahanta Hridoy Jyoti, Boruah Amarjeet, Phukan Bikram, Chutia Hillul, Bharali Pankaj, Nagamani Selvaraman

机构信息

Advanced Computation and Data Sciences Division, CSIR-North East Institute of Science and Technology, Jorhat, 785006, Assam, India.

Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201 002, Uttar Pradesh, India.

出版信息

Mol Divers. 2025 Jul 9. doi: 10.1007/s11030-025-11283-7.

Abstract

Breast cancer is among the most dominant and rapidly rising cancers, both in India and around the world. Triple-negative breast cancer (TNBC) is one of the most aggressive subtypes of breast cancer, distinguished by the absence of HER2, progesterone, and estrogen receptor expressions. This absence limits treatment options, emphasizing the urgent need to discover or design new drug candidates for TNBC. Integrating artificial intelligence and machine learning in computational modeling, has significantly accelerated the analysis of large-scale biological data and improved the prediction of therapeutic outcomes. In this study, we curated a data set of 756 mutant-type compounds from three cell lines and developed four graph-based models to predict active compounds against TNBC. Validated using stratified nested tenfold cross-validation and optimized with the Optuna framework, the models achieved predictive accuracy with AUC values of 0.65-0.82, with the MPNN model outperforming all the others. Furthermore, key structural fragments associated with cell inhibition and model predictions were identified and interpreted using several explainability techniques. Validation with an external set of FDA-approved drugs demonstrated prediction accuracies ranging from 66% to 97%, highlighting the robustness of the models in identifying compounds with potential inhibitory activity against TNBC cells.

摘要

在印度乃至全球,乳腺癌都是最主要且发病率迅速上升的癌症之一。三阴性乳腺癌(TNBC)是乳腺癌中侵袭性最强的亚型之一,其特征是缺乏HER2、孕激素和雌激素受体表达。这种缺失限制了治疗选择,凸显了发现或设计针对TNBC的新候选药物的迫切需求。将人工智能和机器学习整合到计算建模中,显著加速了对大规模生物数据的分析,并改善了对治疗结果的预测。在本研究中,我们整理了来自三种细胞系的756种突变型化合物的数据集,并开发了四种基于图的模型来预测针对TNBC的活性化合物。通过分层嵌套十折交叉验证进行验证,并使用Optuna框架进行优化,这些模型的预测准确率达到了AUC值为0.65 - 0.82,其中MPNN模型表现优于其他所有模型。此外,使用多种可解释性技术识别并解释了与细胞抑制和模型预测相关的关键结构片段。用一组外部FDA批准的药物进行验证,结果表明预测准确率在66%至97%之间,突出了这些模型在识别对TNBC细胞具有潜在抑制活性的化合物方面的稳健性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索