Li Wenlin, Kinch Lisa N, Karplus P Andrew, Grishin Nick V
Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050.
Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, 75390-9050.
Protein Sci. 2015 Jul;24(7):1075-86. doi: 10.1002/pro.2689. Epub 2015 Jun 16.
Chameleon sequences (ChSeqs) refer to sequence strings of identical amino acids that can adopt different conformations in protein structures. Researchers have detected and studied ChSeqs to understand the interplay between local and global interactions in protein structure formation. The different secondary structures adopted by one ChSeq challenge sequence-based secondary structure predictors. With increasing numbers of available Protein Data Bank structures, we here identify a large set of ChSeqs ranging from 6 to 10 residues in length. The homologous ChSeqs discovered highlight the structural plasticity involved in biological function. When compared with previous studies, the set of unrelated ChSeqs found represents an about 20-fold increase in the number of detected sequences, as well as an increase in the longest ChSeq length from 8 to 10 residues. We applied secondary structure predictors on our ChSeqs and found that methods based on a sequence profile outperformed methods based on a single sequence. For the unrelated ChSeqs, the evolutionary information provided by the sequence profile typically allows successful prediction of the prevailing secondary structure adopted in each protein family. Our dataset will facilitate future studies of ChSeqs, as well as interpretations of the interplay between local and nonlocal interactions. A user-friendly web interface for this ChSeq database is available at prodata.swmed.edu/chseq.
变色龙序列(ChSeqs)是指在蛋白质结构中可呈现不同构象的相同氨基酸序列串。研究人员已对变色龙序列进行了检测和研究,以了解蛋白质结构形成过程中局部和全局相互作用之间的相互关系。一个变色龙序列所采用的不同二级结构对基于序列的二级结构预测器构成了挑战。随着蛋白质数据库结构数量的增加,我们在此识别出了一大组长度在6至10个残基之间的变色龙序列。所发现的同源变色龙序列凸显了生物功能中涉及的结构可塑性。与之前的研究相比,所发现的不相关变色龙序列集在检测到的序列数量上增加了约20倍,同时最长的变色龙序列长度也从8个残基增加到了10个残基。我们将二级结构预测器应用于我们的变色龙序列,发现基于序列概况的方法优于基于单个序列的方法。对于不相关的变色龙序列,序列概况提供的进化信息通常能成功预测每个蛋白质家族中普遍采用的二级结构。我们的数据集将有助于未来对变色龙序列的研究,以及对局部和非局部相互作用之间相互关系的解读。可通过prodata.swmed.edu/chseq访问此变色龙序列数据库的用户友好型网络界面。