Su Kathy, Mayans Olga, Diederichs Kay, Fleming Jennifer R
Department of Biology, Universität Konstanz, Konstanz, Baden Württemberg 78456, Germany.
Comput Struct Biotechnol J. 2022 Sep 26;20:5409-5419. doi: 10.1016/j.csbj.2022.09.034. eCollection 2022.
Sequence comparison is critical for the functional assignment of newly identified protein genes. As uncharacterized protein sequences accumulate, there is an increasing need for sensitive tools for their classification. Here, we present a novel multidimensional scaling pipeline, PaSiMap, which creates a map of pairwise sequence similarities. Uniquely, PaSiMap distinguishes between unique and shared features, allowing for a distinct view of protein-sequence relationships. We demonstrate PaSiMap's efficiency in detecting sequence groups and outliers using titin's 169 immunoglobulin (Ig) domains. We show that Ig domain similarity is hierarchical, being firstly determined by chain location, then by the loop features of the Ig fold and, finally, by super-repeat position. The existence of a previously unidentified domain repeat in the distal, constitutive I-band is revealed. Prototypic Igs, plus notable outliers, are identified and thereby domain classification improved. This re-classification can now guide future molecular research. In summary, we demonstrate that PaSiMap is a sensitive tool for the classification of protein sequences, which adds a new perspective in the understanding of inter-protein relationships. PaSiMap is applicable to any biological system defined by a linear sequence, including polynucleotide chains.
序列比较对于新鉴定的蛋白质基因的功能分配至关重要。随着未表征的蛋白质序列不断积累,对用于其分类的灵敏工具的需求也日益增加。在此,我们提出了一种新颖的多维缩放管道PaSiMap,它可创建成对序列相似性图谱。独特的是,PaSiMap能够区分独特特征和共享特征,从而提供对蛋白质序列关系的独特视角。我们利用肌联蛋白的169个免疫球蛋白(Ig)结构域证明了PaSiMap在检测序列组和异常值方面的效率。我们表明Ig结构域相似性具有层级性,首先由链的位置决定,其次由Ig折叠的环特征决定,最后由超级重复序列的位置决定。揭示了在远端组成性I带中存在一个先前未鉴定的结构域重复序列。鉴定出了典型的Ig以及显著的异常值,从而改进了结构域分类。这种重新分类现在可为未来的分子研究提供指导。总之,我们证明PaSiMap是一种用于蛋白质序列分类的灵敏工具,它为理解蛋白质间关系增添了新的视角。PaSiMap适用于由线性序列定义的任何生物系统,包括多核苷酸链。