Suppr超能文献

PairK:用于量化无序区域中蛋白质基序保守性的成对k-mer比对

PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions.

作者信息

Halpin Jackson C, Keating Amy E

机构信息

Department of Biology, MIT, Cambridge, Massachusetts, USA.

Department of Biological Engineering, MIT, Cambridge, Massachusetts, USA.

出版信息

Protein Sci. 2025 Jan;34(1):e70004. doi: 10.1002/pro.70004.

Abstract

Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. To understand the features of SLiMs that are important for binding and to identify motif instances that are important for biological function, it is useful to examine the evolutionary conservation of motifs across homologous proteins. However, the intrinsically disordered regions (IDRs) in which SLiMs reside evolve rapidly. Consequently, multiple sequence alignment (MSA) of IDRs often misaligns SLiMs and underestimates their conservation. We present PairK (pairwise k-mer alignment), an MSA-free method to align and quantify the relative local conservation of subsequences within an IDR. Lacking a ground truth for conservation, we tested PairK on the task of distinguishing biologically important motif instances from background motifs, under the assumption that biologically important motifs are more conserved. The method outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that some SLiMs are more conserved than MSA-based metrics imply. PairK is available as an open-source python package at https://github.com/jacksonh1/pairk. It is designed to be easily adapted for use with other SLiM tools and for diverse applications.

摘要

蛋白质-蛋白质相互作用通常由模块化的肽识别结构域介导,该结构域与另一种蛋白质无序区域中的短线性基序(SLiM)结合。为了了解对结合重要的SLiM的特征,并识别对生物学功能重要的基序实例,研究同源蛋白质中基序的进化保守性是很有用的。然而,SLiM所在的内在无序区域(IDR)进化迅速。因此,IDR的多序列比对(MSA)常常会使SLiM比对错误,并低估它们的保守性。我们提出了PairK(成对k-mer比对),这是一种无需MSA的方法,用于比对和量化IDR内子序列的相对局部保守性。由于缺乏保守性的真实标准,我们在区分生物学上重要的基序实例与背景基序的任务上测试了PairK,假设生物学上重要的基序更保守。该方法优于基于标准MSA的保守性评分和基于现代语言模型的保守性评分预测器。与MSA相比,PairK可以在更广泛的系统发育距离上量化保守性,这表明一些SLiM比基于MSA的指标所暗示的更保守。PairK作为一个开源的Python包可在https://github.com/jacksonh1/pairk获得。它被设计为易于与其他SLiM工具一起使用,并适用于各种应用。

相似文献

本文引用的文献

3
DR-BERT: A protein language model to annotate disordered regions.DR-BERT:一种用于注释无规则区域的蛋白质语言模型。
Structure. 2024 Aug 8;32(8):1260-1268.e3. doi: 10.1016/j.str.2024.04.010. Epub 2024 May 2.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验