利用人工智能和多组学分析揭示可成药的癌症驱动蛋白和靶向药物。

Unraveling druggable cancer-driving proteins and targeted drugs using artificial intelligence and multi-omics analyses.

机构信息

Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador.

Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador.

出版信息

Sci Rep. 2024 Aug 21;14(1):19359. doi: 10.1038/s41598-024-68565-7.

DOI:10.1038/s41598-024-68565-7

PMID:39169044

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11339426/

Abstract

The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins .

摘要

可成药性蛋白质组是指能够与小分子结合的蛋白质，具有适当的化学亲和力，从而诱导有利的临床反应。通过筛选和计算机建模预测可成药性蛋白质对于药物设计至关重要。为了为该领域做出贡献，我们使用蛋白质序列的氨基酸组成描述符和 13 种机器学习线性和非线性分类器，为可成药性致癌驱动蛋白开发了一种准确的预测分类器。最优的分类器是利用支持向量机方法，利用 200 个三氨基酸组成描述符实现的。该模型的高性能体现在接收者操作特征曲线（AUROC）下的面积为 0.975 ± 0.003，准确率为 0.929 ± 0.006（三折交叉验证）。该机器学习预测模型通过多组学方法进行了增强，包括靶疾病证据评分、最短癌症标志通路、基于结构的配体能力评估、不利预后蛋白分析和致癌变异组。此外，我们还进行了药物再利用分析，以确定具有最高亲和力的药物，能够靶向最佳预测的蛋白质。结果，我们确定了 79 种具有最高配体能力的关键可成药性致癌驱动蛋白，其中 23 种在 16 种 TCGA 泛癌类型中具有不利的预后意义：CDKN2A、BCL10、ACVR1、CASP8、JAG1、TSC1、NBN、PREX2、PPP2R1A、DNM2、VAV1、ASXL1、TPR、HRAS、BUB1B、ATG7、MARK3、SETD2、CCNE1、MUTYH、CDKN2C、RB1 和 SMARCA4。此外，我们还针对这些蛋白质优先考虑了 11 种具有临床相关性的药物。该策略可有效地预测和优先考虑生物标志物、治疗靶点和药物，以在临床试验中进行深入研究。脚本可在 https://github.com/muntisa/machine-learning-for-druggable-proteins 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4c1/11339426/ed36d50c2670/41598_2024_68565_Fig1_HTML.jpg

相似文献

Unraveling druggable cancer-driving proteins and targeted drugs using artificial intelligence and multi-omics analyses.利用人工智能和多组学分析揭示可成药的癌症驱动蛋白和靶向药物。

Sci Rep. 2024 Aug 21;14(1):19359. doi: 10.1038/s41598-024-68565-7.

Machine and deep learning approaches for cancer drug repurposing.机器和深度学习方法在癌症药物再利用中的应用。

Semin Cancer Biol. 2021 Jan;68:132-142. doi: 10.1016/j.semcancer.2019.12.011. Epub 2020 Jan 3.

DeepDRK: a deep learning framework for drug repurposing through kernel-based multi-omics integration.DeepDRK：一种基于核的多组学整合的药物重定位深度学习框架。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab048.

DeepDRA: Drug repurposing using multi-omics data integration with autoencoders.DeepDRA：利用自动编码器进行多组学数据整合进行药物重定位。

PLoS One. 2024 Jul 26;19(7):e0307649. doi: 10.1371/journal.pone.0307649. eCollection 2024.

From multi-omics data to the cancer druggable gene discovery: a novel machine learning-based approach.从多组学数据到癌症可药物化基因发现：一种基于机器学习的新方法。

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac528.

Artificial intelligence, machine learning, and drug repurposing in cancer.人工智能、机器学习和癌症药物再利用。

Expert Opin Drug Discov. 2021 Sep;16(9):977-989. doi: 10.1080/17460441.2021.1883585. Epub 2021 Feb 12.

Predicting drug-target interaction network using deep learning model.利用深度学习模型预测药物-靶标相互作用网络。

Comput Biol Chem. 2019 Jun;80:90-101. doi: 10.1016/j.compbiolchem.2019.03.016. Epub 2019 Mar 25.

Non-Negative Matrix Tri-Factorization for Representation Learning in Multi-Omics Datasets with Applications to Drug Repurposing and Selection.非负矩阵三因子分解在多组学数据集中的表示学习及其在药物重定位和选择中的应用

Int J Mol Sci. 2024 Sep 4;25(17):9576. doi: 10.3390/ijms25179576.

Attention-based approach to predict drug-target interactions across seven target superfamilies.基于注意力的方法预测跨越七个靶标超家族的药物-靶标相互作用。

Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae496.

Machine Learning Approach for Predicting New Uses of Existing Drugs and Evaluation of Their Reliabilities.用于预测现有药物新用途及其可靠性评估的机器学习方法。

Methods Mol Biol. 2019;1903:269-279. doi: 10.1007/978-1-4939-8955-3_16.

引用本文的文献

Cancer genomics and bioinformatics in Latin American countries: applications, challenges, and perspectives.拉丁美洲国家的癌症基因组学与生物信息学：应用、挑战与前景

Front Oncol. 2025 Jul 9;15:1584178. doi: 10.3389/fonc.2025.1584178. eCollection 2025.

Target identification of natural products in cancer with chemical proteomics and artificial intelligence approaches.利用化学蛋白质组学和人工智能方法鉴定癌症中天然产物的靶点

Cancer Biol Med. 2025 Jul 9;22(6):549-97. doi: 10.20892/j.issn.2095-3941.2025.0145.

Global analysis of actionable genomic alterations in thyroid cancer and precision-based pharmacogenomic strategies.甲状腺癌中可操作基因组改变的全球分析及基于精准医学的药物基因组学策略

Front Pharmacol. 2025 Apr 14;16:1524623. doi: 10.3389/fphar.2025.1524623. eCollection 2025.

Worldwide analysis of actionable genomic alterations in lung cancer and targeted pharmacogenomic strategies.肺癌可操作基因组改变的全球分析及靶向药物基因组学策略

Heliyon. 2024 Sep 5;10(17):e37488. doi: 10.1016/j.heliyon.2024.e37488. eCollection 2024 Sep 15.

本文引用的文献

Sirt5 regulates chondrocyte metabolism and osteoarthritis development through protein lysine malonylation.Sirt5通过蛋白质赖氨酸丙二酰化调节软骨细胞代谢和骨关节炎发展。

bioRxiv. 2024 Aug 6:2024.07.23.604872. doi: 10.1101/2024.07.23.604872.

Gastric cancer actionable genomic alterations across diverse populations worldwide and pharmacogenomics strategies based on precision oncology.全球不同人群中胃癌可操作的基因组改变以及基于精准肿瘤学的药物基因组学策略。

Front Pharmacol. 2024 May 2;15:1373007. doi: 10.3389/fphar.2024.1373007. eCollection 2024.

The pharmacoepigenetic paradigm in cancer treatment.癌症治疗中的药物表观遗传学模式。

Front Pharmacol. 2024 Apr 24;15:1381168. doi: 10.3389/fphar.2024.1381168. eCollection 2024.

Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models.可成药蛋白的综合研究：从位置特异性得分矩阵到预训练语言模型

Int J Mol Sci. 2024 Apr 19;25(8):4507. doi: 10.3390/ijms25084507.

Toward Equitable Precision Oncology: Monitoring Racial and Ethnic Inclusion in Genomics and Clinical Trials.迈向公平精准肿瘤学：监测基因组学和临床试验中的种族和民族包容性。

JCO Precis Oncol. 2024 Apr;8:e2300398. doi: 10.1200/PO.23.00398.

CardiOmics signatures reveal therapeutically actionable targets and drugs for cardiovascular diseases.心脏组学特征揭示了心血管疾病的可治疗靶点和药物。

Heliyon. 2023 Dec 14;10(1):e23682. doi: 10.1016/j.heliyon.2023.e23682. eCollection 2024 Jan 15.

The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods.2023 年的 ChEMBL 数据库：一个涵盖多种生物活性数据类型和时间段的药物发现平台。

Nucleic Acids Res. 2024 Jan 5;52(D1):D1180-D1192. doi: 10.1093/nar/gkad1004.

Repurposing the FDA-approved anthelmintic pyrvinium pamoate for pancreatic cancer treatment: study protocol for a phase I clinical trial in early-stage pancreatic ductal adenocarcinoma.重新利用美国食品和药物管理局批准的驱虫药匹鲁卡品治疗胰腺癌：早期胰腺导管腺癌的 I 期临床试验研究方案。

BMJ Open. 2023 Oct 17;13(10):e073839. doi: 10.1136/bmjopen-2023-073839.

Recent advances in targeting the "undruggable" proteins: from drug discovery to clinical trials.靶向“不可成药”蛋白的最新进展：从药物发现到临床试验。

Signal Transduct Target Ther. 2023 Sep 6;8(1):335. doi: 10.1038/s41392-023-01589-z.

Integrated multi-omics analysis reveals the molecular interplay between circadian clocks and cancer pathogenesis.整合多组学分析揭示了生物钟与癌症发病机制之间的分子相互作用。

Sci Rep. 2023 Aug 30;13(1):14198. doi: 10.1038/s41598-023-39401-1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用人工智能和多组学分析揭示可成药的癌症驱动蛋白和靶向药物。

Unraveling druggable cancer-driving proteins and targeted drugs using artificial intelligence and multi-omics analyses.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献