一种利用蛋白质相互作用配体进行蛋白质分布表示的新方法。

A novel methodology on distributed representations of proteins using their interacting ligands.

机构信息

Department of Computer Engineering, Bogazici University, Istanbul, Turkey.

Department of Chemical Engineering, Bogazici University, Istanbul, Turkey.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i295-i303. doi: 10.1093/bioinformatics/bty287.

DOI:10.1093/bioinformatics/bty287

PMID:29949957

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6022674/

Abstract

MOTIVATION

The effective representation of proteins is a crucial task that directly affects the performance of many bioinformatics problems. Related proteins usually bind to similar ligands. Chemical characteristics of ligands are known to capture the functional and mechanistic properties of proteins suggesting that a ligand-based approach can be utilized in protein representation. In this study, we propose SMILESVec, a Simplified molecular input line entry system (SMILES)-based method to represent ligands and a novel method to compute similarity of proteins by describing them based on their ligands. The proteins are defined utilizing the word-embeddings of the SMILES strings of their ligands. The performance of the proposed protein description method is evaluated in protein clustering task using TransClust and MCL algorithms. Two other protein representation methods that utilize protein sequence, Basic local alignment tool and ProtVec, and two compound fingerprint-based protein representation methods are compared.

RESULTS

We showed that ligand-based protein representation, which uses only SMILES strings of the ligands that proteins bind to, performs as well as protein sequence-based representation methods in protein clustering. The results suggest that ligand-based protein description can be an alternative to the traditional sequence or structure-based representation of proteins and this novel approach can be applied to different bioinformatics problems such as prediction of new protein-ligand interactions and protein function annotation.

AVAILABILITY AND IMPLEMENTATION

https://github.com/hkmztrk/SMILESVecProteinRepresentation.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

蛋白质的有效表示是一个关键任务，直接影响许多生物信息学问题的性能。相关的蛋白质通常与类似的配体结合。已知配体的化学特征可以捕获蛋白质的功能和机制特性，这表明可以在蛋白质表示中利用基于配体的方法。在这项研究中，我们提出了 SMILESVec，一种基于简化分子输入行输入系统（SMILES）的方法来表示配体，以及一种通过基于其配体来描述蛋白质来计算蛋白质相似性的新方法。蛋白质是利用其配体的 SMILES 字符串的词嵌入来定义的。使用 TransClust 和 MCL 算法在蛋白质聚类任务中评估了所提出的蛋白质描述方法的性能。还比较了另外两种利用蛋白质序列的蛋白质表示方法，即基本局部比对工具和 ProtVec，以及两种基于化合物指纹的蛋白质表示方法。

结果

我们表明，仅使用蛋白质结合的配体的 SMILES 字符串的基于配体的蛋白质表示在蛋白质聚类中与基于蛋白质序列的表示方法一样有效。结果表明，基于配体的蛋白质描述可以替代传统的基于序列或结构的蛋白质表示，并且这种新方法可以应用于不同的生物信息学问题，例如预测新的蛋白质-配体相互作用和蛋白质功能注释。

可用性和实现

https://github.com/hkmztrk/SMILESVecProteinRepresentation。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7162/6022674/a5e7283f5c5c/bty287f1.jpg

相似文献

A novel methodology on distributed representations of proteins using their interacting ligands.

Bioinformatics. 2018 Jul 1;34(13):i295-i303. doi: 10.1093/bioinformatics/bty287.

Granular clustering of de novo protein models.

Bioinformatics. 2017 Feb 1;33(3):390-396. doi: 10.1093/bioinformatics/btw628.

DeepDTA: deep drug-target binding affinity prediction.

Bioinformatics. 2018 Sep 1;34(17):i821-i829. doi: 10.1093/bioinformatics/bty593.

Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites.

Bioinformatics. 2008 Aug 15;24(16):i105-11. doi: 10.1093/bioinformatics/btn263.

LS-align: an atom-level, flexible ligand structural alignment algorithm for high-throughput virtual screening.

Bioinformatics. 2018 Jul 1;34(13):2209-2218. doi: 10.1093/bioinformatics/bty081.

Prediction of Protein-Ligand Interaction Based on the Positional Similarity Scores Derived from Amino Acid Sequences.

Int J Mol Sci. 2019 Dec 18;21(1):24. doi: 10.3390/ijms21010024.

Patch-DCA: improved protein interface prediction by utilizing structural information and clustering DCA scores.

Bioinformatics. 2020 Mar 1;36(5):1460-1467. doi: 10.1093/bioinformatics/btz791.

Learned protein embeddings for machine learning.

Bioinformatics. 2018 Aug 1;34(15):2642-2648. doi: 10.1093/bioinformatics/bty178.

Surface-based multimodal protein-ligand binding affinity prediction.

Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae413.

DEAttentionDTA: protein-ligand binding affinity prediction based on dynamic embedding and self-attention.

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae319.

引用本文的文献

Beyond the leaderboard: leveraging predictive modeling for protein-ligand insights and discovery.

Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf425.

Domain adaptable language modeling of chemical compounds identifies potent pathoblockers for Pseudomonas aeruginosa.

Commun Chem. 2025 Apr 11;8(1):114. doi: 10.1038/s42004-025-01484-4.

Natural Language Processing Methods for the Study of Protein-Ligand Interactions.

J Chem Inf Model. 2025 Mar 10;65(5):2191-2213. doi: 10.1021/acs.jcim.4c01907. Epub 2025 Feb 24.

Natural Language Processing Methods for the Study of Protein-Ligand Interactions.

ArXiv. 2024 Oct 17:arXiv:2409.13057v2.

Protein feature engineering framework for AMPylation site prediction.

Sci Rep. 2024 Apr 15;14(1):8695. doi: 10.1038/s41598-024-58450-8.

PSnpBind-ML: predicting the effect of binding site mutations on protein-ligand binding affinity.

J Cheminform. 2023 Mar 2;15(1):31. doi: 10.1186/s13321-023-00701-3.

Gene expression based inference of cancer drug sensitivity.

Nat Commun. 2022 Sep 27;13(1):5680. doi: 10.1038/s41467-022-33291-z.

Organizing the bacterial annotation space with amino acid sequence embeddings.

BMC Bioinformatics. 2022 Sep 23;23(1):385. doi: 10.1186/s12859-022-04930-5.

Machine Learning in Antibacterial Drug Design.

Front Pharmacol. 2022 May 3;13:864412. doi: 10.3389/fphar.2022.864412. eCollection 2022.

Deep Neural Network-Assisted Drug Recommendation Systems for Identifying Potential Drug-Target Interactions.

ACS Omega. 2022 Mar 31;7(14):12138-12146. doi: 10.1021/acsomega.2c00424. eCollection 2022 Apr 12.

本文引用的文献

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition.

J Chem Inf Model. 2018 Jan 22;58(1):27-35. doi: 10.1021/acs.jcim.7b00616. Epub 2018 Jan 10.

The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching.

J Cheminform. 2017 Jun 6;9(1):33. doi: 10.1186/s13321-017-0220-4.

Multitask Protein Function Prediction through Task Dissimilarity.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1550-1560. doi: 10.1109/TCBB.2017.2684127. Epub 2017 Mar 17.

Mechanism of error-free DNA synthesis across N1-methyl-deoxyadenosine by human DNA polymerase-ι.

Sci Rep. 2017 Mar 8;7:43904. doi: 10.1038/srep43904.

SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.

J Mol Biol. 2017 Feb 3;429(3):348-355. doi: 10.1016/j.jmb.2016.11.023. Epub 2016 Nov 30.

Network biology concepts in complex disease comorbidities.

Nat Rev Genet. 2016 Oct;17(10):615-29. doi: 10.1038/nrg.2016.87. Epub 2016 Aug 8.

Benchmarking a Wide Range of Chemical Descriptors for Drug-Target Interaction Prediction Using a Chemogenomic Approach.

Mol Inform. 2014 Dec;33(11-12):719-31. doi: 10.1002/minf.201400066. Epub 2014 Nov 24.

PLoS One. 2016 Jul 28;11(7):e0160098. doi: 10.1371/journal.pone.0160098. eCollection 2016.

Network pharmacology of cancer: From understanding of complex interactomes to the design of multi-target specific therapeutics from nature.

Pharmacol Res. 2016 Sep;111:290-302. doi: 10.1016/j.phrs.2016.06.018. Epub 2016 Jun 18.

How Reliable Are Ligand-Centric Methods for Target Fishing?

Front Chem. 2016 Apr 14;4:15. doi: 10.3389/fchem.2016.00015. eCollection 2016.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种利用蛋白质相互作用配体进行蛋白质分布表示的新方法。

A novel methodology on distributed representations of proteins using their interacting ligands.

机构信息

Department of Computer Engineering, Bogazici University, Istanbul, Turkey.

Department of Chemical Engineering, Bogazici University, Istanbul, Turkey.