Liu Jen-Wei, Lin Jau-Ji, Cheng Chih-Wen, Lin Yu-Feng, Hwang Jenn-Kang, Huang Tsun-Tsao
Institute of Bioinformatics and Systems Biology, National Chiao Tung University, HsinChu, Taiwan, Republic of China.
Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan, Republic of China.
Proteins. 2017 Sep;85(9):1713-1723. doi: 10.1002/prot.25329. Epub 2017 Jun 28.
Residues that are crucial to protein function or structure are usually evolutionarily conserved. To identify the important residues in protein, sequence conservation is estimated, and current methods rely upon the unbiased collection of homologous sequences. Surprisingly, our previous studies have shown that the sequence conservation is closely correlated with the weighted contact number (WCN), a measure of packing density for residue's structural environment, calculated only based on the C positions of a protein structure. Moreover, studies have shown that sequence conservation is correlated with environment-related structural properties calculated based on different protein substructures, such as a protein's all atoms, backbone atoms, side-chain atoms, or side-chain centroid. To know whether the C atomic positions are adequate to show the relationship between residue environment and sequence conservation or not, here we compared C atoms with other substructures in their contributions to the sequence conservation. Our results show that C positions are substantially equivalent to the other substructures in calculations of various measures of residue environment. As a result, the overlapping contributions between C atoms and the other substructures are high, yielding similar structure-conservation relationship. Take the WCN as an example, the average overlapping contribution to sequence conservation is 87% between C and all-atom substructures. These results indicate that only C atoms of a protein structure could reflect sequence conservation at the residue level.
对蛋白质功能或结构至关重要的残基通常在进化上是保守的。为了识别蛋白质中的重要残基,人们会估计序列保守性,而目前的方法依赖于同源序列的无偏收集。令人惊讶的是,我们之前的研究表明,序列保守性与加权接触数(WCN)密切相关,WCN是一种仅基于蛋白质结构的C原子位置计算的残基结构环境堆积密度的度量。此外,研究表明序列保守性与基于不同蛋白质子结构计算的与环境相关的结构特性相关,例如蛋白质的所有原子、主链原子、侧链原子或侧链质心。为了了解C原子位置是否足以显示残基环境与序列保守性之间的关系,我们在此比较了C原子与其他子结构对序列保守性的贡献。我们的结果表明,在计算各种残基环境度量时,C位置与其他子结构基本等效。因此,C原子与其他子结构之间的重叠贡献很高,产生了相似的结构-保守性关系。以WCN为例,C与全原子子结构之间对序列保守性的平均重叠贡献为87%。这些结果表明,蛋白质结构的仅C原子就可以在残基水平上反映序列保守性。