Terashi Genki, Takeda-Shitaka Mayuko
School of Pharmacy, Kitasato University, Tokyo, Japan.
PLoS One. 2015 Oct 26;10(10):e0141440. doi: 10.1371/journal.pone.0141440. eCollection 2015.
Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue-residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue-residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html.
蛋白质具有灵活性,这种灵活性具有重要的功能作用。在环区、二级结构元件之间的重排以及整个结构域之间的构象变化中都能观察到灵活性。然而,大多数蛋白质结构比对方法将蛋白质结构视为刚体。因此,这些方法无法识别具有灵活性区域中残基对的等效性。在本研究中,我们认为蛋白质之间的进化关系直接对应于残基 - 残基的物理接触,而非蛋白质的三维(3D)坐标。因此,我们开发了一种新的蛋白质结构比对方法,即基于接触面积的比对(CAB-align),它利用残基 - 残基接触面积来识别相似区域。CAB-align的主要目的是在相关蛋白质结构之间的残基水平上识别同源关系。CAB-align程序包括两个主要步骤:首先,采用基于局部和全局3D结构叠加的刚体比对方法来生成足够数量的初始比对。然后,执行迭代动态规划以找到最优比对。我们基于四个要点评估了CAB-align的性能和优势:(1)与金标准比对的一致性,(2)基于进化关系且无3D坐标叠加的比对质量,(3)多序列比对的一致性,以及(4)与金标准分类的分类一致性。使用我们的基准数据集将CAB-align与其他先进的蛋白质结构比对方法(TM-align、FATCAT和DaliLite)进行比较,结果表明CAB-align在获得高质量比对以及生成具有高覆盖率和准确率的一致多序列比对方面表现稳健,并且在单结构域和多结构域比较中区分同源和非同源蛋白质对时表现极佳。学术用户可从http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html免费获取独立软件形式的CAB-align软件。