如何进行基于机器学习的药物/化合物-靶点相互作用预测。

How to approach machine learning-based prediction of drug/compound-target interactions.

作者信息

Atas Guvenilir Heval, Doğan Tunca

机构信息

Biological Data Science Laboratory, Department of Computer Engineering, Hacettepe University, Ankara, Turkey.

Department of Health Informatics, Graduate School of Informatics, METU, Ankara, Turkey.

出版信息

J Cheminform. 2023 Feb 6;15(1):16. doi: 10.1186/s13321-023-00689-w.

DOI:10.1186/s13321-023-00689-w

PMID:36747300

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9901167/

Abstract

The identification of drug/compound-target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been developed. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extraction of interaction-related properties during the artificial learning and subsequent prediction of DTIs. Lately, the representation learning approach, in which input samples are automatically featurized via training and applying a machine/deep learning model, has been utilized in biomedical sciences. In this study, we performed a comprehensive investigation of different computational approaches/techniques for protein featurization (including both conventional approaches and the novel learned embeddings), data preparation and exploration, machine learning-based modeling, and performance evaluation with the aim of achieving better data representations and more successful learning in DTI prediction. For this, we first constructed realistic and challenging benchmark datasets on small, medium, and large scales to be used as reliable gold standards for specific DTI modeling tasks. We developed and applied a network analysis-based splitting strategy to divide datasets into structurally different training and test folds. Using these datasets together with various featurization methods, we trained and tested DTI prediction models and evaluated their performance from different angles. Our main findings can be summarized under 3 items: (i) random splitting of datasets into train and test folds leads to near-complete data memorization and produce highly over-optimistic results, as a result, should be avoided, (ii) learned protein sequence embeddings work well in DTI prediction and offer high potential, despite interaction-related properties (e.g., structures) of proteins are unused during their self-supervised model training, and (iii) during the learning process, PCM models tend to rely heavily on compound features while partially ignoring protein features, primarily due to the inherent bias in DTI data, indicating the requirement for new and unbiased datasets. We hope this study will aid researchers in designing robust and high-performing data-driven DTI prediction systems that have real-world translational value in drug discovery.

摘要

药物/化合物-靶点相互作用（DTIs）的识别是药物发现的基础，为此人们开发了计算预测方法。作为一种相对较新的数据驱动范式，蛋白质化学计量学（PCM）建模在输入层面将蛋白质和化合物属性作为一对来利用，并通过统计/机器学习对其进行处理。以定量特征向量的形式表示输入样本（即蛋白质及其配体）对于在人工学习和随后的DTIs预测过程中提取相互作用相关属性至关重要。最近，表征学习方法已被应用于生物医学科学领域，在该方法中，通过训练和应用机器学习/深度学习模型自动对输入样本进行特征提取。在本研究中，我们对蛋白质特征提取的不同计算方法/技术（包括传统方法和新的学习嵌入）、数据准备与探索、基于机器学习的建模以及性能评估进行了全面研究，目的是在DTI预测中实现更好的数据表示和更成功的学习。为此，我们首先构建了小、中、大规模的现实且具有挑战性的基准数据集，用作特定DTI建模任务的可靠金标准。我们开发并应用了一种基于网络分析的拆分策略，将数据集划分为结构不同的训练集和测试集。使用这些数据集以及各种特征提取方法，我们训练和测试了DTI预测模型，并从不同角度评估了它们的性能。我们的主要发现可概括为三点：（i）将数据集随机拆分为训练集和测试集会导致近乎完全的数据记忆，并产生高度乐观的结果，因此应避免这种做法；（ii）学习到的蛋白质序列嵌入在DTI预测中表现良好且具有很大潜力，尽管蛋白质的相互作用相关属性（如结构）在其自监督模型训练过程中未被使用；（iii）在学习过程中，PCM模型倾向于严重依赖化合物特征，而部分忽略蛋白质特征，这主要是由于DTI数据中存在固有偏差，这表明需要新的、无偏差的数据集。我们希望这项研究将有助于研究人员设计出强大且高性能的数据驱动DTI预测系统，这些系统在药物发现中具有实际的转化价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1c9b/9901167/12d4a267de2f/13321_2023_689_Fig1_HTML.jpg

相似文献

How to approach machine learning-based prediction of drug/compound-target interactions.如何进行基于机器学习的药物/化合物-靶点相互作用预测。

J Cheminform. 2023 Feb 6;15(1):16. doi: 10.1186/s13321-023-00689-w.

MDeePred: novel multi-channel protein featurization for deep learning-based binding affinity prediction in drug discovery.MDeePred：用于药物发现中基于深度学习的结合亲和力预测的新型多通道蛋白质特征化。

Bioinformatics. 2021 May 5;37(5):693-704. doi: 10.1093/bioinformatics/btaa858.

GSRF-DTI: a framework for drug-target interaction prediction based on a drug-target pair network and representation learning on a large graph.GSRF-DTI：一种基于药物-靶点对网络和大图表示学习的药物-靶点相互作用预测框架。

BMC Biol. 2024 Jul 18;22(1):156. doi: 10.1186/s12915-024-01949-3.

GSL-DTI: Graph structure learning network for Drug-Target interaction prediction.GSL-DTI：用于药物-靶标相互作用预测的图结构学习网络。

Methods. 2024 Mar;223:136-145. doi: 10.1016/j.ymeth.2024.01.018. Epub 2024 Feb 14.

Unsupervised Representation Learning for Proteochemometric Modeling.无监督表示学习在定量构效关系建模中的应用。

Int J Mol Sci. 2021 Nov 28;22(23):12882. doi: 10.3390/ijms222312882.

Deep Learning in Drug Target Interaction Prediction: Current and Future Perspectives.深度学习在药物靶点相互作用预测中的应用：现状与未来展望。

Curr Med Chem. 2021;28(11):2100-2113. doi: 10.2174/0929867327666200907141016.

A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing.基于包装特征选择和类别平衡的药物-靶标相互作用预测的机器学习方法。

Mol Inform. 2020 May;39(5):e1900062. doi: 10.1002/minf.201900062. Epub 2020 Feb 11.

DTi2Vec: Drug-target interaction prediction using network embedding and ensemble learning.DTi2Vec：使用网络嵌入和集成学习进行药物-靶点相互作用预测

J Cheminform. 2021 Sep 22;13(1):71. doi: 10.1186/s13321-021-00552-w.

EmbedDTI: Enhancing the Molecular Representations via Sequence Embedding and Graph Convolutional Network for the Prediction of Drug-Target Interaction.EmbedDTI：通过序列嵌入和图卷积网络增强分子表示，用于药物-靶标相互作用预测。

Biomolecules. 2021 Nov 29;11(12):1783. doi: 10.3390/biom11121783.

Supervised graph co-contrastive learning for drug-target interaction prediction.基于监督图协同对比学习的药物-靶标相互作用预测。

Bioinformatics. 2022 May 13;38(10):2847-2854. doi: 10.1093/bioinformatics/btac164.

引用本文的文献

QKDTI A quantum kernel based machine learning model for drug target interaction prediction.QKDTI：一种基于量子核的用于药物靶点相互作用预测的机器学习模型。

Sci Rep. 2025 Jul 25;15(1):27103. doi: 10.1038/s41598-025-07303-z.

DTGHAT: multi-molecule heterogeneous graph transformer based on multi-molecule graph for drug-target identification.DTGHAT：基于多分子图的用于药物靶点识别的多分子异构图变换器

Front Pharmacol. 2025 Apr 28;16:1596216. doi: 10.3389/fphar.2025.1596216. eCollection 2025.

The diagnostic and prognostic value of in colorectal cancer.[此处“in”前面应还有具体内容]在结直肠癌中的诊断和预后价值。

Bioimpacts. 2024 Nov 5;15:30566. doi: 10.34172/bi.30566. eCollection 2025.

The G Protein-Coupled Receptor-Related Gene Signatures for Diagnosis and Prognosis in Glioblastoma: A Deep Learning Model Using RNA-Seq Data.胶质母细胞瘤诊断和预后的G蛋白偶联受体相关基因特征：一种使用RNA测序数据的深度学习模型

Asian Pac J Cancer Prev. 2024 Dec 1;25(12):4201-4210. doi: 10.31557/APJCP.2024.25.12.4201.

QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool.QSPRpred：一个灵活的开源定量结构-性质关系建模工具。

J Cheminform. 2024 Nov 14;16(1):128. doi: 10.1186/s13321-024-00908-y.

A comprehensive comparison of deep learning-based compound-target interaction prediction models to unveil guiding design principles.基于深度学习的化合物-靶点相互作用预测模型的全面比较，以揭示指导设计原则。

J Cheminform. 2024 Oct 28;16(1):118. doi: 10.1186/s13321-024-00913-1.

Causal enhanced drug-target interaction prediction based on graph generation and multi-source information fusion.基于图生成和多源信息融合的因果增强药物-靶标相互作用预测。

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae570.

Artificial intelligence and bioinformatics: a journey from traditional techniques to smart approaches.人工智能与生物信息学：从传统技术到智能方法的历程。

Gastroenterol Hepatol Bed Bench. 2024;17(3):241-252. doi: 10.22037/ghfbb.v17i3.2977.

The recent advances in the approach of artificial intelligence (AI) towards drug discovery.人工智能（AI）在药物发现方法方面的最新进展。

Front Chem. 2024 May 31;12:1408740. doi: 10.3389/fchem.2024.1408740. eCollection 2024.

The application of large language models in medicine: A scoping review.大语言模型在医学中的应用：一项范围综述。

iScience. 2024 Apr 23;27(5):109713. doi: 10.1016/j.isci.2024.109713. eCollection 2024 May 17.

本文引用的文献

Unsupervised Representation Learning for Proteochemometric Modeling.无监督表示学习在定量构效关系建模中的应用。

Int J Mol Sci. 2021 Nov 28;22(23):12882. doi: 10.3390/ijms222312882.

Protein domain-based prediction of drug/compound-target interactions and experimental validation on LIM kinases.基于蛋白质结构域的药物/化合物-靶标相互作用预测及在 LIM 激酶上的实验验证。

PLoS Comput Biol. 2021 Nov 29;17(11):e1009171. doi: 10.1371/journal.pcbi.1009171. eCollection 2021 Nov.

A unified drug-target interaction prediction framework based on knowledge graph and recommendation system.基于知识图谱和推荐系统的药物-靶标相互作用预测统一框架。

Nat Commun. 2021 Nov 22;12(1):6775. doi: 10.1038/s41467-021-27137-3.

CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations.CROssBAR：具有知识图谱表示的生物医学关系的综合资源。

Nucleic Acids Res. 2021 Sep 20;49(16):e96. doi: 10.1093/nar/gkab543.

Crowdsourced mapping of unexplored target space of kinase inhibitors.激酶抑制剂未探索靶标空间的众包绘图。

Nat Commun. 2021 Jun 3;12(1):3307. doi: 10.1038/s41467-021-23165-1.

The chemfp project.化学指纹项目。

J Cheminform. 2019 Dec 5;11(1):76. doi: 10.1186/s13321-019-0398-8.

Evaluating Protein Transfer Learning with TAPE.使用TAPE评估蛋白质迁移学习。

Adv Neural Inf Process Syst. 2019 Dec;32:9689-9701.

UniProt: the universal protein knowledgebase in 2021.UniProt：2021 年的通用蛋白质知识库。

Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100.

DEEPScreen: high performance drug-target interaction prediction with convolutional neural networks using 2-D structural compound representations.深度筛选：使用二维结构化合物表示法通过卷积神经网络进行高性能药物-靶点相互作用预测。

Chem Sci. 2020 Jan 8;11(9):2531-2557. doi: 10.1039/c9sc03414e. eCollection 2020 Mar 7.

Bioinformatics. 2021 May 5;37(5):693-704. doi: 10.1093/bioinformatics/btaa858.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

如何进行基于机器学习的药物/化合物-靶点相互作用预测。

How to approach machine learning-based prediction of drug/compound-target interactions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献