Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America.
Harris School of Public Policy and Center for Data Science and Public Policy, University of Chicago, Chicago, IL, United States of America.
PLoS Comput Biol. 2018 Nov 29;14(11):e1006626. doi: 10.1371/journal.pcbi.1006626. eCollection 2018 Nov.
The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures.
由于与可用序列数量相比,三维(3D)蛋白质结构的稀缺性,蛋白质构象动力学在用于预测遗传突变影响的方法中很少被使用。到目前为止,还需要 3D 结构来预测蛋白质的构象动力学。我们引入了一种无需依赖结构信息即可估算蛋白质构象动力学的方法。这种从头开始的方法利用多序列比对(MSA)中使用 Potts 模型确定的共进化残基。这些共进化残基被用作高斯网络模型(GNM)中的接触点,以获得蛋白质动力学。使用基于序列的 GNM(Seq-GNM)计算的 B 因子与晶体学 B 因子以及利用 3D 结构的原始 GNM 的理论 B 因子一致。此外,我们展示了从 Seq-GNM 方法计算的 B 因子根据其表型区分基因组变体的能力,这些变体适用于广泛的蛋白质。这些结果表明,可以仅基于序列信息来近似蛋白质动力学,从而可以在不知道 3D 结构的情况下评估 nSNV 的表型。我们希望这项工作能够通过避免对 3D 结构的需求,促进在大规模遗传疾病预测中使用动力学信息。