He David, Parkinson John
Program in Molecular Structure and Function, Hospital for Sick Children, University of Toronto, Toronto, Canada.
Bioinformatics. 2008 Apr 1;24(7):1016-7. doi: 10.1093/bioinformatics/btn073. Epub 2008 Feb 26.
Low-complexity, repetitive protein sequences with a limited amino acid palette are abundant in nature, and many of them play an important role in the structure and function of certain types of proteins. However, such repetitive sequences often do not have rigidly defined motifs. Consequently, the identification of these low-complexity repetitive elements has proven challenging for existing pattern-matching algorithms. Here we introduce a new web-tool SubSeqer (http://compsysbio.org/subseqer/) which uses graphical visualization methods borrowed from protein interaction studies to identify and characterize repetitive elements in low-complexity sequences. Given their abundance, we suggest that SubSeqer represents a valuable resource for the study of typically neglected low-complexity sequences.
低复杂性、具有有限氨基酸组成的重复蛋白质序列在自然界中大量存在,其中许多在某些类型蛋白质的结构和功能中发挥着重要作用。然而,此类重复序列往往没有严格定义的基序。因此,对于现有的模式匹配算法而言,识别这些低复杂性重复元件已被证明具有挑战性。在此,我们引入了一种新的网络工具SubSeqer(http://compsysbio.org/subseqer/),它利用从蛋白质相互作用研究中借鉴的图形可视化方法来识别和表征低复杂性序列中的重复元件。鉴于它们的丰富性,我们认为SubSeqer是研究通常被忽视的低复杂性序列的宝贵资源。