Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador.
Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador.
Sci Rep. 2024 Aug 21;14(1):19359. doi: 10.1038/s41598-024-68565-7.
The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins .
可成药性蛋白质组是指能够与小分子结合的蛋白质,具有适当的化学亲和力,从而诱导有利的临床反应。通过筛选和计算机建模预测可成药性蛋白质对于药物设计至关重要。为了为该领域做出贡献,我们使用蛋白质序列的氨基酸组成描述符和 13 种机器学习线性和非线性分类器,为可成药性致癌驱动蛋白开发了一种准确的预测分类器。最优的分类器是利用支持向量机方法,利用 200 个三氨基酸组成描述符实现的。该模型的高性能体现在接收者操作特征曲线(AUROC)下的面积为 0.975 ± 0.003,准确率为 0.929 ± 0.006(三折交叉验证)。该机器学习预测模型通过多组学方法进行了增强,包括靶疾病证据评分、最短癌症标志通路、基于结构的配体能力评估、不利预后蛋白分析和致癌变异组。此外,我们还进行了药物再利用分析,以确定具有最高亲和力的药物,能够靶向最佳预测的蛋白质。结果,我们确定了 79 种具有最高配体能力的关键可成药性致癌驱动蛋白,其中 23 种在 16 种 TCGA 泛癌类型中具有不利的预后意义:CDKN2A、BCL10、ACVR1、CASP8、JAG1、TSC1、NBN、PREX2、PPP2R1A、DNM2、VAV1、ASXL1、TPR、HRAS、BUB1B、ATG7、MARK3、SETD2、CCNE1、MUTYH、CDKN2C、RB1 和 SMARCA4。此外,我们还针对这些蛋白质优先考虑了 11 种具有临床相关性的药物。该策略可有效地预测和优先考虑生物标志物、治疗靶点和药物,以在临床试验中进行深入研究。脚本可在 https://github.com/muntisa/machine-learning-for-druggable-proteins 上获得。