National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, 560 065 Bangalore, India.
BioData Min. 2013 Nov 15;6(1):20. doi: 10.1186/1756-0381-6-20.
Influx of newly determined crystal structures into primary structural databases is increasing at a rapid pace. This leads to updation of primary and their dependent secondary databases which makes large scale analysis of structures even more challenging. Hence, it becomes essential to compare and appreciate replacement of data and inclusion of new data that is critical between two updates. PASS2 is a database that retains structure-based sequence alignments of protein domain superfamilies and relies on SCOP database for its hierarchy and definition of superfamily members. Since, accurate alignments of distantly related proteins are useful evolutionary models for depicting variations within protein superfamilies, this study aims to trace the changes in data in between PASS2 updates.
In this study, differences in superfamily compositions, family constituents and length variations between different versions of PASS2 have been tracked. Studying length variations in protein domains, which have been introduced by indels (insertions/deletions), are important because theses indels act as evolutionary signatures in introducing variations in substrate specificity, domain interactions and sometimes even regulating protein stability. With this objective of classifying the nature and source of variations in the superfamilies during transitions (between the different versions of PASS2), increasing length-rigidity of the superfamilies in the recent version is observed. In order to study such length-variant superfamilies in detail, an improved classification approach is also presented, which divides the superfamilies into distinct groups based on their extent of length variation.
An objective study in terms of transition between the database updates, detailed investigation of the new/old members and examination of their structural alignments is non-trivial and will help researchers in designing experiments on specific superfamilies, in various modelling studies, in linking representative superfamily members to rapidly expanding sequence space and in evaluating the effects of length variations of new members in drug target proteins. The improved objective classification scheme developed here would be useful in future for automatic analysis of length variation in cases of updates of databases or even within different secondary databases.
新确定的晶体结构涌入初级结构数据库的速度正在迅速加快。这导致了初级数据库及其依赖的二级数据库的更新,这使得对结构进行大规模分析更加具有挑战性。因此,比较和欣赏两个更新之间数据的替换和新数据的纳入变得至关重要。PASS2 是一个保留蛋白质结构域超家族基于结构序列比对的数据库,其层次结构和超家族成员的定义依赖于 SCOP 数据库。由于准确比对远距离相关的蛋白质是描绘蛋白质超家族内变异的有用进化模型,因此本研究旨在追踪 PASS2 更新之间数据的变化。
本研究跟踪了不同版本的 PASS2 中超家族组成、家族组成和长度变化的差异。研究蛋白质域中的长度变化,这些变化是由插入/缺失(indels)引起的,这很重要,因为这些 indels 作为进化特征,在引入底物特异性、结构域相互作用甚至调节蛋白质稳定性的变异方面发挥作用。为了在过渡期间(在 PASS2 的不同版本之间)对超家族中的变异的性质和来源进行分类,最近版本中超家族的长度刚性增加。为了详细研究这种长度变化的超家族,还提出了一种改进的分类方法,该方法根据其长度变化的程度将超家族分为不同的组。
从数据库更新之间的转换、对新/旧成员的详细调查以及对其结构比对的检查的角度来看,这是一项非平凡的客观研究,将有助于研究人员在特定超家族的实验设计、各种建模研究、将代表性超家族成员与快速扩展的序列空间联系起来以及评估新成员的长度变化对药物靶蛋白的影响。这里开发的改进的客观分类方案在未来数据库更新甚至在不同的二级数据库中,对长度变化的自动分析将非常有用。