Solis Armando D
Biological Sciences Department, New York City College of Technology, The City University of New York, Brooklyn, New York, United States of America.
PLoS One. 2014 Jun 4;9(6):e94334. doi: 10.1371/journal.pone.0094334. eCollection 2014.
The most informative probability distribution functions (PDFs) describing the Ramachandran phi-psi dihedral angle pair, a fundamental descriptor of backbone conformation of protein molecules, are derived from high-resolution X-ray crystal structures using an information-theoretic approach. The Information Maximization Device (IMD) is established, based on fundamental information-theoretic concepts, and then applied specifically to derive highly resolved phi-psi maps for all 20 single amino acid and all 8000 triplet sequences at an optimal resolution determined by the volume of current data. The paper shows that utilizing the latent information contained in all viable high-resolution crystal structures found in the Protein Data Bank (PDB), totaling more than 77,000 chains, permits the derivation of a large number of optimized sequence-dependent PDFs. This work demonstrates the effectiveness of the IMD and the superiority of the resulting PDFs by extensive fold recognition experiments and rigorous comparisons with previously published triplet PDFs. Because it automatically optimizes PDFs, IMD results in improved performance of knowledge-based potentials, which rely on such PDFs. Furthermore, it provides an easy computational recipe for empirically deriving other kinds of sequence-dependent structural PDFs with greater detail and precision. The high-resolution phi-psi maps derived in this work are available for download.
描述蛋白质分子主链构象的基本描述符——拉马钱德兰(Ramachandran)二面角对(phi-psi)的最具信息性的概率分布函数(PDFs),是使用信息论方法从高分辨率X射线晶体结构中推导出来的。基于基本的信息论概念建立了信息最大化装置(IMD),然后将其专门应用于以当前数据量确定的最佳分辨率,推导所有20种单个氨基酸和所有8000个三联体序列的高分辨率phi-psi图谱。本文表明,利用蛋白质数据库(PDB)中发现的所有可行的高分辨率晶体结构中包含的潜在信息(总计超过77,000条链),可以推导出大量优化的序列依赖性PDFs。通过广泛的折叠识别实验以及与先前发表的三联体PDFs进行严格比较,这项工作证明了IMD的有效性以及所得PDFs的优越性。由于IMD能自动优化PDFs,因此基于此类PDFs的基于知识的势的性能得到了改善。此外,它提供了一种简单的计算方法,用于凭经验推导其他种类的具有更高细节和精度的序列依赖性结构PDFs。这项工作中推导的高分辨率phi-psi图谱可供下载。