DTU Chemistry, Technical University of Denmark, Building 206, Kgs. Lyngby 2800, Denmark.
J Chem Inf Model. 2022 Jul 25;62(14):3391-3400. doi: 10.1021/acs.jcim.2c00243. Epub 2022 Jul 3.
As only 35% of human proteins feature (often partial) PDB structures, the protein structure prediction tool AlphaFold2 (AF2) could have massive impact on human biology and medicine fields, making independent benchmarks of interest. We studied AF2's ability to describe the backbone solvent exposure as a functionally important and easily interpretable "natural coordinate" of protein conformation, using human proteins as test case. After screening for appropriate comparative sets, we matched 1818 human proteins predicted by AF2 against 7585 unique experimental PDBs, and after curation for sequence overlap, we assessed 1264 comparative pairs comprising 115 unique AF2 structures and 652 unique experimental structures. AF2 performed markedly worse for multimers, whereas ligands, cofactors, and experimental resolution were interestingly not very important for performance. AF2 performed excellently for monomer proteins. Challenges relating to specific groups of residues and multimers were analyzed. We identified larger deviations for lower-confidence scores (pLDDT), and exposed residues and polar residues (e.g., Asp, Glu, Asn) being less accurately described than hydrophobic residues. Proline conformations were the hardest to predict, probably due to a common location in dynamic solvent-accessible parts. In summary, using solvent exposure as a metric, we quantified the performance of AF2 for human proteins and provided estimates of the expected agreement as a function of ligand presence, multimer/monomer status, local residue solvent exposure, pLDDT, and amino acid type. Overall performance was found to be excellent.
由于只有 35%的人类蛋白质具有(通常是部分)PDB 结构,因此蛋白质结构预测工具 AlphaFold2(AF2)可能会对人类生物学和医学领域产生巨大影响,成为一个独立的基准。我们研究了 AF2 描述蛋白质构象的功能重要且易于解释的“自然坐标”——骨干溶剂暴露的能力,以人类蛋白质作为测试案例。在筛选出合适的比较集后,我们将由 AF2 预测的 1818 种人类蛋白质与 7585 个独特的实验 PDB 进行匹配,在进行序列重叠的校对后,我们评估了 1264 对比较对,其中包含 115 个独特的 AF2 结构和 652 个独特的实验结构。对于多聚体,AF2 的性能明显较差,而配体、辅因子和实验分辨率对于性能并不非常重要。对于单体蛋白质,AF2 的性能非常出色。我们分析了与特定残基组和多聚体相关的挑战。我们发现置信度得分较低(pLDDT)时会出现更大的偏差,并发现暴露的残基和极性残基(如 Asp、Glu、Asn)比疏水性残基描述得更不准确。脯氨酸构象最难预测,可能是因为它们位于动态溶剂可及部分的常见位置。总的来说,我们使用溶剂暴露作为指标,量化了 AF2 对人类蛋白质的性能,并提供了预期一致性的估计,作为配体存在、多聚体/单体状态、局部残基溶剂暴露、pLDDT 和氨基酸类型的函数。总体性能被认为非常出色。