Department of Biomedical Sciences, University of Padua, 35131 Padova, Italy, Department of Biological Chemistry, Universidad de Buenos Aires, Buenos Aires C1428EGA, Argentina, Department of Information Engineering, University of Padua, 35121 Padova, Italy, Department of Biosciences, COMSATS Institute of Information Technology, Sahiwal, Pakistan, Centre de Recherches de Biochimie Macromoléculaire, CNRS, 34293 Montpellier Cedex 5, France and Institut de Biologie Computationnelle, 34293 Montpellier Cedex 5, France.
Nucleic Acids Res. 2014 Jan;42(Database issue):D352-7. doi: 10.1093/nar/gkt1175. Epub 2013 Dec 5.
RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10,745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services.
RepeatsDB(http://repeatsdb.bio.unipd.it/)是一个注释串联重复蛋白结构的数据库。串联重复是蛋白结构分析的一个难题,因为其基础序列可能高度退化。多年来已经研究了几种重复类型,但它们的注释是逐个进行的,因此难以进行大规模分析。我们开发了 RepeatsDB 来填补这一空白。使用最先进的重复检测方法和人工策展,我们系统地注释了蛋白质数据库,预测了 10745 个重复结构。总共,根据最近提出的分类方案,将 2797 个结构进行了分类,该方案扩展以适应新的发现。此外,还对 321 个蛋白质的子集进行了详细注释。这些注释提供了有关重复区域和单位起始和结束位置的信息。RepeatsDB 是一项持续的工作,旨在以一致的方式系统地分类和注释结构蛋白重复。它为用户提供了通过 Web 服务以交互或编程方式访问和下载高质量数据集的可能性。