Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA.
Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA.
Structure. 2022 Oct 6;30(10):1385-1394.e3. doi: 10.1016/j.str.2022.08.004. Epub 2022 Aug 31.
Approximately 87% of the more than 190,000 atomic-level three-dimensional (3D) biostructures in the PDB were determined using macromolecular crystallography (MX). Agreement between 3D atomic coordinates and experimental data for >100 million individual amino acid residues occurring within ∼150,000 PDB MX structures was analyzed in detail. The real-space correlation coefficient (RSCC) calculated using the 3D atomic coordinates for each residue and experimental-data-derived electron density enables outlier detection of unreliable atomic coordinates (particularly important for poorly resolved side-chain atoms) and ready evaluation of local structure quality by PDB users. For human protein MX structures in PDB, comparisons of the per-residue RSCC metric with AlphaFold2-computed structure model confidence (pLDDT-predicted local distance difference test) document (1) that RSCC values and pLDDT scores are correlated (median correlation coefficient ∼0.41), and (2) that experimentally determined MX structures (3.5 Å resolution or better) are more reliable than AlphaFold2-computed structure models and should be used preferentially whenever possible.
大约 19 万多个pdb 中超过 19 万个原子级别的三维(3D)生物结构是通过大分子晶体学(MX)确定的。详细分析了pdb MX 结构中约 15 万个结构中出现的>1 亿个单个氨基酸残基的 3D 原子坐标与实验数据之间的一致性。使用每个残基的 3D 原子坐标和实验数据衍生的电子密度计算的实空间相关系数(RSCC)可用于检测不可靠原子坐标的异常值(对分辨率差的侧链原子尤其重要),并由pdb 用户轻松评估局部结构质量。对于pdb 中的人类蛋白质 MX 结构,与 AlphaFold2 计算的结构模型置信度(pLDDT-预测局部距离差异测试)相比,每个残基的 RSCC 度量值的比较记录了(1)RSCC 值和 pLDDT 分数之间存在相关性(中位数相关系数约为 0.41),以及(2)实验确定的 MX 结构(分辨率为 3.5Å 或更好)比 AlphaFold2 计算的结构模型更可靠,并且只要可能,应优先使用。