Suppr超能文献

SHARK 能够在不可比对和无序序列中灵敏地检测进化同源物和功能类似物。

SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences.

机构信息

Max Planck Institute of Molecular Cell Biology and Genetics, Dresden 01307, Germany.

Center for Systems Biology Dresden, Dresden 01307, Germany.

出版信息

Proc Natl Acad Sci U S A. 2024 Oct 15;121(42):e2401622121. doi: 10.1073/pnas.2401622121. Epub 2024 Oct 9.

Abstract

Intrinsically disordered regions (IDRs) are structurally flexible protein segments with regulatory functions in multiple contexts, such as in the assembly of biomolecular condensates. Since IDRs undergo more rapid evolution than ordered regions, identifying homology of such poorly conserved regions remains challenging for state-of-the-art alignment-based methods that rely on position-specific conservation of residues. Thus, systematic functional annotation and evolutionary analysis of IDRs have been limited, despite them comprising ~21% of proteins. To accurately assess homology between unalignable sequences, we developed an alignment-free sequence comparison algorithm, SHARK (Similarity/Homology Assessment by Relating K-mers). We trained SHARK-dive, a machine learning homology classifier, which achieved superior performance to standard alignment-based approaches in assessing evolutionary homology in unalignable sequences. Furthermore, it correctly identified dissimilar but functionally analogous IDRs in IDR-replacement experiments reported in the literature, whereas alignment-based tools were incapable of detecting such functional relationships. SHARK-dive not only predicts functionally similar IDRs at a proteome-wide scale but also identifies cryptic sequence properties and motifs that drive remote homology and analogy, thereby providing interpretable and experimentally verifiable hypotheses of the sequence determinants that underlie such relationships. SHARK-dive acts as an alternative to alignment to facilitate systematic analysis and functional annotation of the unalignable protein universe.

摘要

无结构区域 (IDR) 是具有调节功能的结构柔性蛋白质片段,在多种情况下发挥作用,如生物分子凝聚物的组装。由于 IDR 的进化速度比有序区域快,因此,对于依赖残基位置特异性保守的基于比对的最新方法来说,识别这些保护程度较低的区域的同源性仍然具有挑战性。因此,尽管 IDR 约占蛋白质的 21%,但其系统的功能注释和进化分析仍然受到限制。为了准确评估不可比对序列之间的同源性,我们开发了一种无比对的序列比较算法,即 SHARK(通过关联 K-mer 进行相似性/同源性评估)。我们训练了 SHARK-dive,这是一种机器学习同源性分类器,在评估不可比对序列中的进化同源性方面,其性能优于标准的基于比对的方法。此外,它正确识别了文献中报道的 IDR 替换实验中不同但功能类似的 IDR,而基于比对的工具则无法检测到这种功能关系。SHARK-dive 不仅可以在全蛋白质组范围内预测功能相似的 IDR,还可以识别驱动远程同源性和类比的隐藏序列特性和基序,从而提供可解释和可实验验证的序列决定因素假说,这些序列决定因素是此类关系的基础。SHARK-dive 可作为比对的替代方法,促进不可比对蛋白质宇宙的系统分析和功能注释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1336/11494347/249fe73d44ac/pnas.2401622121fig01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验