大规模药物-靶点相互作用预测：Komet 算法与 LCIdb 数据集。

Drug-Target Interactions Prediction at Scale: The Komet Algorithm with the LCIdb Dataset.

机构信息

Center for Computational Biology (CBIO), Mines Paris-PSL, 75006 Paris, France.

Institut Curie, Université PSL, 75005 Paris, France.

出版信息

J Chem Inf Model. 2024 Sep 23;64(18):6938-6956. doi: 10.1021/acs.jcim.4c00422. Epub 2024 Sep 5.

DOI:10.1021/acs.jcim.4c00422

PMID:39237105

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11423346/

Abstract

Drug-target interactions (DTIs) prediction algorithms are used at various stages of the drug discovery process. In this context, specific problems such as deorphanization of a new therapeutic target or target identification of a drug candidate arising from phenotypic screens require large-scale predictions across the protein and molecule spaces. DTI prediction heavily relies on supervised learning algorithms that use known DTIs to learn associations between molecule and protein features, allowing for the prediction of new interactions based on learned patterns. The algorithms must be broadly applicable to enable reliable predictions, even in regions of the protein or molecule spaces where data may be scarce. In this paper, we address two key challenges to fulfill these goals: building large, high-quality training datasets and designing prediction methods that can scale, in order to be trained on such large datasets. First, we introduce LCIdb, a curated, large-sized dataset of DTIs, offering extensive coverage of both the molecule and druggable protein spaces. Notably, LCIdb contains a much higher number of molecules than publicly available benchmarks, expanding coverage of the molecule space. Second, we propose Komet (Kronecker Optimized METhod), a DTI prediction pipeline designed for scalability without compromising performance. Komet leverages a three-step framework, incorporating efficient computation choices tailored for large datasets and involving the Nyström approximation. Specifically, Komet employs a Kronecker interaction module for (molecule, protein) pairs, which efficiently captures determinants in DTIs, and whose structure allows for reduced computational complexity and quasi-Newton optimization, ensuring that the model can handle large training sets, without compromising on performance. Our method is implemented in open-source software, leveraging GPU parallel computation for efficiency. We demonstrate the interest of our pipeline on various datasets, showing that Komet displays superior scalability and prediction performance compared to state-of-the-art deep learning approaches. Additionally, we illustrate the generalization properties of Komet by showing its performance on an external dataset, and on the publicly available benchmark designed for scaffold hopping problems. Komet is available open source at https://komet.readthedocs.io and all datasets, including LCIdb, can be found at https://zenodo.org/records/10731712.

摘要

药物-靶点相互作用（DTI）预测算法在药物发现过程的各个阶段都有应用。在这种情况下，新治疗靶点的去孤儿化或表型筛选产生的药物候选物的靶点识别等具体问题需要在蛋白质和分子空间进行大规模预测。DTI 预测严重依赖于监督学习算法，这些算法使用已知的 DTI 来学习分子和蛋白质特征之间的关联，从而根据学习到的模式预测新的相互作用。这些算法必须具有广泛的适用性，以便即使在数据可能稀缺的蛋白质或分子空间区域也能进行可靠的预测。在本文中，我们解决了实现这些目标的两个关键挑战：构建大型、高质量的训练数据集和设计能够扩展的预测方法，以便可以在如此大型的数据上进行训练。首先，我们引入了 LCIdb，这是一个经过精心整理的、大型的 DTI 数据集，提供了对分子和可成药蛋白质空间的广泛覆盖。值得注意的是，LCIdb 包含的分子数量比公开可用的基准数据集多得多，从而扩大了分子空间的覆盖范围。其次，我们提出了 Komet（Kronecker Optimized METhod），这是一种专为可扩展性而设计的 DTI 预测管道，在不影响性能的情况下实现扩展。Komet 利用了一个三步骤框架，包括针对大数据集的高效计算选择，并涉及 Nyström 逼近。具体来说，Komet 为（分子，蛋白质）对采用 Kronecker 交互模块，该模块有效地捕获 DTI 中的决定因素，并且其结构允许降低计算复杂度和拟牛顿优化，确保模型可以处理大型训练集，而不会影响性能。我们的方法在开源软件中实现，利用 GPU 并行计算来提高效率。我们在各种数据集上展示了我们的管道的优势，表明 Komet 与最先进的深度学习方法相比具有优越的可扩展性和预测性能。此外，我们通过展示其在外部数据集和公开的用于支架跳跃问题的基准数据集上的性能，说明了 Komet 的泛化性质。Komet 可在 https://komet.readthedocs.io 上获得开源，并可在 https://zenodo.org/records/10731712 上找到所有数据集，包括 LCIdb。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3aa7/11423346/e648bb246238/ci4c00422_0001.jpg

相似文献

Drug-Target Interactions Prediction at Scale: The Komet Algorithm with the LCIdb Dataset.大规模药物-靶点相互作用预测：Komet 算法与 LCIdb 数据集。

J Chem Inf Model. 2024 Sep 23;64(18):6938-6956. doi: 10.1021/acs.jcim.4c00422. Epub 2024 Sep 5.

DTI-LM: language model powered drug-target interaction prediction.DTI-LM：基于语言模型的药物-靶标相互作用预测。

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae533.

Predicting Drug-Target Interactions Based on Small Positive Samples.基于少量阳性样本预测药物-靶点相互作用

Curr Protein Pept Sci. 2018;19(5):479-487. doi: 10.2174/1389203718666161108102330.

GSRF-DTI: a framework for drug-target interaction prediction based on a drug-target pair network and representation learning on a large graph.GSRF-DTI：一种基于药物-靶点对网络和大图表示学习的药物-靶点相互作用预测框架。

BMC Biol. 2024 Jul 18;22(1):156. doi: 10.1186/s12915-024-01949-3.

Predicting drug-target interactions using restricted Boltzmann machines.基于受限玻尔兹曼机的药物-靶标相互作用预测。

Bioinformatics. 2013 Jul 1;29(13):i126-34. doi: 10.1093/bioinformatics/btt234.

GraphormerDTI: A graph transformer-based approach for drug-target interaction prediction.GraphormerDTI：一种基于图Transformer 的药物-靶标相互作用预测方法。

Comput Biol Med. 2024 May;173:108339. doi: 10.1016/j.compbiomed.2024.108339. Epub 2024 Mar 18.

Inferring Interactions between Novel Drugs and Novel Targets via Instance-Neighborhood-Based Models.通过基于实例邻域的模型推断新型药物与新型靶点之间的相互作用。

Curr Protein Pept Sci. 2018;19(5):488-497. doi: 10.2174/1389203718666161108093907.

Drug-target interaction prediction using Multi Graph Regularized Nuclear Norm Minimization.基于多图正则化核范数最小化的药物-靶点相互作用预测。

PLoS One. 2020 Jan 16;15(1):e0226484. doi: 10.1371/journal.pone.0226484. eCollection 2020.

DTI-MLCD: predicting drug-target interactions using multi-label learning with community detection method.DTI-MLCD：基于社区检测方法的多标签学习预测药物-靶标相互作用

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa205.

A Machine Learning Approach for Drug-target Interaction Prediction using Wrapper Feature Selection and Class Balancing.基于包装特征选择和类别平衡的药物-靶标相互作用预测的机器学习方法。

Mol Inform. 2020 May;39(5):e1900062. doi: 10.1002/minf.201900062. Epub 2020 Feb 11.

本文引用的文献

Contrastive learning in protein language space predicts interactions between drugs and protein targets.蛋白质语言空间中的对比学习可预测药物与蛋白质靶标之间的相互作用。

Proc Natl Acad Sci U S A. 2023 Jun 13;120(24):e2220778120. doi: 10.1073/pnas.2220778120. Epub 2023 Jun 8.

Exploring isofunctional molecules: Design of a benchmark and evaluation of prediction performance.探索同功分子：基准设计与预测性能评估。

Mol Inform. 2023 Apr;42(4):e2200216. doi: 10.1002/minf.202200216. Epub 2023 Feb 17.

X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis.X-MOL：用于分子理解和多样分子分析的大规模预训练

Sci Bull (Beijing). 2022 May 15;67(9):899-902. doi: 10.1016/j.scib.2022.01.029. Epub 2022 Feb 1.

Perceiver CPI: a nested cross-attention network for compound-protein interaction prediction.感知器 CPI：一种用于化合物-蛋白质相互作用预测的嵌套交叉注意网络。

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac731.

A Consensus Compound/Bioactivity Dataset for Data-Driven Drug Design and Chemogenomics.用于数据驱动药物设计和化学生物组学的共识化合物/生物活性数据集。

Molecules. 2022 Apr 13;27(8):2513. doi: 10.3390/molecules27082513.

The IUPHAR/BPS guide to PHARMACOLOGY in 2022: curating pharmacology for COVID-19, malaria and antibacterials.2022 年 IUPHAR/BPS 药理学指南：为 COVID-19、疟疾和抗菌药物药物治疗学。

Nucleic Acids Res. 2022 Jan 7;50(D1):D1282-D1294. doi: 10.1093/nar/gkab1010.

DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science.DGL-LifeSci：用于生命科学领域图深度学习的开源工具包。

ACS Omega. 2021 Oct 5;6(41):27233-27238. doi: 10.1021/acsomega.1c04017. eCollection 2021 Oct 19.

HyperAttentionDTI: improving drug-protein interaction prediction by sequence-based deep learning with attention mechanism.超注意力 DTI：基于注意力机制的序列深度学习提高药物-蛋白相互作用预测

Bioinformatics. 2022 Jan 12;38(3):655-662. doi: 10.1093/bioinformatics/btab715.

Comprehensive Survey of Recent Drug Discovery Using Deep Learning.深度学习在药物发现中的最新应用综述

Int J Mol Sci. 2021 Sep 15;22(18):9983. doi: 10.3390/ijms22189983.

Using molecular embeddings in QSAR modeling: does it make a difference?在定量构效关系建模中使用分子嵌入：有区别吗？

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab365.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

大规模药物-靶点相互作用预测：Komet 算法与 LCIdb 数据集。

Drug-Target Interactions Prediction at Scale: The Komet Algorithm with the LCIdb Dataset.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献