Hrabe Thomas, Jaroszewski Lukasz, Godzik Adam
Department of Bioinformatics and Systems Biology, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA.
Bioinformatics. 2016 Sep 15;32(18):2776-82. doi: 10.1093/bioinformatics/btw319. Epub 2016 Jun 9.
Repeat proteins, which contain multiple repeats of short sequence motifs, form a large but seldom-studied group of proteins. Methods focusing on the analysis of 3D structures of such proteins identified many subtle effects in length distribution of individual motifs that are important for their functions. However, similar analysis was yet not applied to the vast majority of repeat proteins with unknown 3D structures, mostly because of the extreme diversity of the underlying motifs and the resulting difficulty to detect those.
We developed FAIT, a sequence-based algorithm for the precise assignment of individual repeats in repeat proteins and introduced a framework to classify and compare aperiodicity patterns for large protein families. FAIT extracts repeat positions by post-processing FFAS alignment matrices with image processing methods. On examples of proteins with Leucine Rich Repeat (LRR) domains and other solenoids like proteins, we show that the automated analysis with FAIT correctly identifies exact lengths of individual repeats based entirely on sequence information.
https://github.com/GodzikLab/FAIT CONTACT: adam@godziklab.org
Supplementary data are available at Bioinformatics online.
重复蛋白包含短序列基序的多个重复,构成了一大类但很少被研究的蛋白质。专注于此类蛋白质三维结构分析的方法在单个基序的长度分布中发现了许多对其功能很重要的细微效应。然而,类似的分析尚未应用于绝大多数三维结构未知的重复蛋白,主要是因为潜在基序的极度多样性以及由此导致的难以检测到这些基序。
我们开发了FAIT,一种基于序列的算法,用于精确分配重复蛋白中的各个重复,并引入了一个框架来对大型蛋白质家族的非周期性模式进行分类和比较。FAIT通过使用图像处理方法对FFAS比对矩阵进行后处理来提取重复位置。在富含亮氨酸重复(LRR)结构域的蛋白质以及其他类似螺线管结构的蛋白质的例子中,我们表明使用FAIT进行的自动分析完全基于序列信息就能正确识别各个重复的准确长度。
https://github.com/GodzikLab/FAIT 联系方式:adam@godziklab.org
补充数据可在《生物信息学》在线获取。