Tang Christopher L, Xie Lei, Koh Ingrid Y Y, Posy Shoshana, Alexov Emil, Honig Barry
Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University, New York, NY 10032, USA.
J Mol Biol. 2003 Dec 12;334(5):1043-62. doi: 10.1016/j.jmb.2003.10.025.
Structural alignments often reveal relationships between proteins that cannot be detected using sequence alignment alone. However, profile search methods based entirely on structural alignments alone have not been found to be effective in finding remote homologs. Here, we explore the role of structural information in remote homolog detection and sequence alignment. To this end, we develop a series of hybrid multidimensional alignment profiles that combine sequence, secondary and tertiary structure information into hybrid profiles. Sequence-based profiles are profiles whose position-specific scoring matrix is derived from sequence alignment alone; structure-based profiles are those derived from multiple structure alignments. We compare pure sequence-based profiles to pure structure-based profiles, as well as to hybrid profiles that use combined sequence-and-structure-based profiles, where sequence-based profiles are used in loop/motif regions and structural information is used in core structural regions. All of the hybrid methods offer significant improvement over simple profile-to-profile alignment. We demonstrate that both sequence-based and structure-based profiles contribute to remote homology detection and alignment accuracy, and that each contains some unique information. We discuss the implications of these results for further improvements in amino acid sequence and structural analysis.
结构比对常常能揭示仅使用序列比对无法检测到的蛋白质之间的关系。然而,完全基于结构比对的轮廓搜索方法在寻找远源同源物方面并未被发现是有效的。在此,我们探索结构信息在远源同源物检测和序列比对中的作用。为此,我们开发了一系列混合多维比对轮廓,将序列、二级和三级结构信息整合到混合轮廓中。基于序列的轮廓是指其位置特异性评分矩阵仅从序列比对中得出的轮廓;基于结构的轮廓是指从多个结构比对中得出的轮廓。我们将纯基于序列的轮廓与纯基于结构的轮廓进行比较,同时也与使用基于序列和结构的组合轮廓的混合轮廓进行比较,其中基于序列的轮廓用于环/基序区域,结构信息用于核心结构区域。所有的混合方法都比简单的轮廓对轮廓比对有显著改进。我们证明基于序列和基于结构的轮廓都有助于远源同源性检测和比对准确性,并且每个都包含一些独特的信息。我们讨论了这些结果对氨基酸序列和结构分析进一步改进的意义。