Yuan Rongqing, Zhang Jing, Kryshtafovych Andriy, Schaeffer R Dustin, Zhou Jian, Cong Qian, Grishin Nick V
Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA.
Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA.
bioRxiv. 2025 Jun 2:2025.05.29.656942. doi: 10.1101/2025.05.29.656942.
The assessment of monomer targets in the Critical Assessment of Structure Prediction Round 16 (CASP16) underscores that the problem of single-domain protein fold prediction is nearly solved-no target folds were missed across all Evaluation Units. However, challenges remain in accurately modeling truncated sequences, irregular secondary structures, and interchain-induced conformational changes. The release of AlphaFold3 (AF3) during CASP16, and its effective integration by many groups, demonstrated its superiority over AlphaFold2 (AF2), particularly in confidence estimation and model selection. Additional improvements in multiple sequence alignments (MSAs) and construct design, i.e., selecting the optimal fragment of the full sequence for modeling, also contributed to enhanced prediction accuracy. The top three groups-all from the Yang lab-consistently outperformed others across CASP16 monomer targets, reflecting their robust modeling pipelines and successful adoption of AF3. CASP16 also introduced three new challenges: Phase 0, in which stoichiometry was withheld; Phase 2, which supplied ~8,000 MassiveFold models per target to test model selection strategies; and Model 6, which limited predictors to using MSAs provided by the organizers. While we evaluated group performance in these additional challenges, the insights gained were limited due to low participation and design flaws in the experiments. We suggest improvements for the organization of these challenges and encourage broader engagement from the prediction community. The progress in monomer modeling from CASP15 to CASP16 was very subtle, but more groups in CASP16 were able to outperform ColabFold, reflecting increased expertise in optimizing AF2 and the growing adoption of AF3. We anticipate that the recent release of the AF3 source code will stimulate future progress through user-driven optimization and innovations in model architecture. Finally, model ranking remains a persistent weakness across most groups, highlighting a critical area for future development.
在第16轮蛋白质结构预测关键评估(CASP16)中对单体目标的评估强调,单结构域蛋白质折叠预测问题几乎已得到解决——所有评估单元中均未遗漏目标折叠结构。然而,在对截短序列、不规则二级结构以及链间诱导的构象变化进行精确建模方面仍存在挑战。在CASP16期间发布的AlphaFold3(AF3)以及许多团队对其的有效整合,证明了它相对于AlphaFold2(AF2)的优越性,尤其是在置信度估计和模型选择方面。多序列比对(MSA)和构建设计方面的进一步改进,即选择全序列的最佳片段进行建模,也有助于提高预测准确性。排名前三的团队——均来自杨实验室——在CASP16的单体目标上始终优于其他团队,这反映了他们强大的建模流程以及对AF3的成功应用。CASP16还引入了三个新挑战:0阶段,其中化学计量学信息被 withheld;2阶段,每个目标提供约8000个MassiveFold模型以测试模型选择策略;以及6模型,该模型限制预测器只能使用组织者提供的MSA。虽然我们评估了各团队在这些额外挑战中的表现,但由于参与度低和实验设计缺陷,获得的见解有限。我们建议改进这些挑战的组织方式,并鼓励预测社区更广泛地参与。从CASP15到CASP16,单体建模的进展非常细微,但CASP16中有更多团队能够超越ColabFold,这反映出在优化AF2方面专业知识的增加以及AF3的使用越来越广泛。我们预计,最近发布的AF3源代码将通过用户驱动的优化和模型架构创新推动未来的进展。最后,模型排名在大多数团队中仍然是一个持续存在的弱点,突出了未来发展的一个关键领域。