Trinh The-Chuong, Phan Tieu-Long, To Van-Thinh, Pham Thanh-An, Truong Gia-Bao, Le Lai Hoang Son, Tran Xuan-Truc Dinh, Truong Tuyen Ngoc
Université Grenoble Alpes, Laboratoire Biosciences et Bioingénierie pour la Santé, UA13 INSERM-CEA-UGA, 3800, Grenoble, France.
Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics and School for Embedded and Composite Artificial Intelligence (SECAI), Leipzig University, Härtelstraße 16-18, 04107, Leipzig, Germany.
J Comput Aided Mol Des. 2025 Sep 15;39(1):79. doi: 10.1007/s10822-025-00657-6.
This study addresses the urgent need for an AI model to predict Anaplastic Lymphoma Kinase (ALK) inhibitors for Non-Small Cell Lung Cancer treatment, targeting the ALK-positive mutation. With only five Food and Drug Administration approved ALK inhibitors currently available, effective drugs remain in demand. Leveraging machine learning (ML) and deep learning (DL), our research accelerates the precise screening of novel ALK inhibitors using both ligand-based and structure-based approaches. In ligand-based approach, an ensemble voting model comprising three base learners to classify potential ALK inhibitors, achieving promising retrospective validation results. Notably, the ML-based XGBoost algorithm exhibited compelling results with external validation (EV)-f1 score of 0.921, EV-Average Precision (AP) of 0.961, cross-validation (CV)-f1 score of [Formula: see text] and CV-AP of [Formula: see text]. Besides, the DL-based Artificial Neural Network (ANN) model demonstrated comparative performance with EV-f1 score of 0.930, EV-AP of 0.955, CV-f1 score of [Formula: see text] and CV-AP of [Formula: see text]. For structure-based approach, an XGBoost consensus docking model utilized scores from three molecular docking programs (GNINA 1.0, Vina-GPU 2.0, and AutoDock-GPU) as features. Combining these two approaches, we virtually screened 120,571 compounds, identifying three promising ALK inhibitors, CHEMBL1689515, CHEMBL2380351, and CHEMBL102714, that bind to the protein's pocket and establish hydrophobic contacts in the hinge region through their ketone groups, resembling Alectinib's interaction. Comparative analysis revealed traditional ML models outperformed Graph Neural Networks (GNN), highlighting the critical role of feature engineering and dataset size importance. The study recommends further in vitro testing to validate the prospective screening performance of these models. A graphical user interface is available at https://huggingface.co/spaces/thechuongtrinh/ALK_inhibitors_classification .
本研究针对非小细胞肺癌治疗中预测间变性淋巴瘤激酶(ALK)抑制剂的迫切需求,以ALK阳性突变为靶点。目前美国食品药品监督管理局仅批准了五种ALK抑制剂,有效药物仍供不应求。利用机器学习(ML)和深度学习(DL),我们的研究采用基于配体和基于结构的方法,加速新型ALK抑制剂的精确筛选。在基于配体的方法中,一个由三个基础学习器组成的集成投票模型对潜在的ALK抑制剂进行分类,取得了有前景的回顾性验证结果。值得注意的是,基于ML的XGBoost算法在外部验证(EV)-f1分数为0.921、EV-平均精度(AP)为0.961、交叉验证(CV)-f1分数为[公式:见原文]和CV-AP为[公式:见原文]时表现出令人信服的结果。此外,基于DL的人工神经网络(ANN)模型表现出相当的性能,EV-f1分数为0.930、EV-AP为0.955、CV-f1分数为[公式:见原文]和CV-AP为[公式:见原文]。对于基于结构的方法,一个XGBoost共识对接模型利用来自三个分子对接程序(GNINA 1.0、Vina-GPU 2.0和AutoDock-GPU)的分数作为特征。结合这两种方法,我们虚拟筛选了120,571种化合物,鉴定出三种有前景的ALK抑制剂,即CHEMBL1689515、CHEMBL2380351和CHEMBL102714,它们与蛋白质口袋结合,并通过其酮基在铰链区建立疏水接触,类似于阿来替尼的相互作用。比较分析表明,传统的ML模型优于图神经网络(GNN),突出了特征工程的关键作用和数据集大小的重要性。该研究建议进一步进行体外测试,以验证这些模型的前瞻性筛选性能。可在https://huggingface.co/spaces/thechuongtrinh/ALK_inhibitors_classification获取图形用户界面。