Suppr超能文献

利用预先计算的蛋白质 3D 序列模型预测蛋白质-蛋白质相互作用。

TT3D: Leveraging precomputed protein 3D sequence models to predict protein-protein interactions.

机构信息

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, United States.

Department of Computer Science, Tufts University, 177 College Avenue, Medford, MA 02155, United States.

出版信息

Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad663.

Abstract

MOTIVATION

High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information of distances and angles along the protein backbone into a linear string of the same length as the protein string, using tokens from a 21-letter discretized structural alphabet (3Di).

RESULTS

We show that using both the amino acid sequence and the 3Di sequence generated by Foldseek as inputs to our recent deep-learning method, Topsy-Turvy, substantially improves the performance of predicting protein-protein interactions cross-species. Thus TT3D (Topsy-Turvy 3D) presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, while being sufficiently lightweight so that high-quality binary protein-protein interaction predictions across all protein pairs can be made genome-wide.

AVAILABILITY AND IMPLEMENTATION

TT3D is available at https://github.com/samsledje/D-SCRIPT. An archived version of the code at time of submission can be found at https://zenodo.org/records/10037674.

摘要

动机

现在已经预先计算出了高质量的计算结构模型,并且几乎可以在 UniProt 中的每个蛋白质中都可以使用。然而,以高通量的方式利用这些模型来预测哪些蛋白质对相互作用的最佳方法还不明确。van Kempen 等人最近提出的 Foldseek 方法将距离和角度的结构信息编码为与蛋白质字符串长度相同的线性字符串,使用来自 21 字母离散化结构字母表(3Di)的标记。

结果

我们表明,将氨基酸序列和 Foldseek 生成的 3Di 序列作为输入,用于我们最近的深度学习方法 Topsy-Turvy,可显著提高跨物种预测蛋白质-蛋白质相互作用的性能。因此,TT3D(Topsy-Turvy 3D)提供了一种方法,可以重复使用从序列生成高质量结构模型所投入的所有计算工作,同时又足够轻量级,可以在全基因组范围内对所有蛋白质对进行高质量的二进制蛋白质-蛋白质相互作用预测。

可用性和实现

TT3D 可在 https://github.com/samsledje/D-SCRIPT 上获得。提交时的代码存档版本可在 https://zenodo.org/records/10037674 上找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/76bb/10640393/2af5e1cf3db4/btad663f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验