Genome Center, University of California, Davis, California, USA.
Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, England, UK.
Proteins. 2023 Dec;91(12):1558-1570. doi: 10.1002/prot.26533. Epub 2023 May 31.
Processing of CASP15 targets into evaluation units (EUs) and assigning them to evolutionary-based prediction classes is presented in this study. The targets were first split into structural domains based on compactness and similarity to other proteins. Models were then evaluated against these domains and their combinations. The domains were joined into larger EUs if predictors' performance on the combined units was similar to that on individual domains. Alternatively, if most predictors performed better on the individual domains, then they were retained as EUs. As a result, 112 evaluation units were created from 77 tertiary structure prediction targets. The EUs were assigned to four prediction classes roughly corresponding to target difficulty categories in previous CASPs: TBM (template-based modeling, easy or hard), FM (free modeling), and the TBM/FM overlap category. More than a third of CASP15 EUs were attributed to the historically most challenging FM class, where homology or structural analogy to proteins of known fold cannot be detected.
本研究提出了将 CASP15 靶标处理成评估单元 (EU) 并将其分配到基于进化的预测类别中。首先根据紧凑性和与其他蛋白质的相似性将靶标划分为结构域。然后,根据这些结构域及其组合评估模型。如果预测器在组合单元上的性能与在单个结构域上的性能相似,则将这些结构域合并为更大的 EU。或者,如果大多数预测器在单个结构域上的性能更好,则将它们保留为 EU。结果,从 77 个三级结构预测靶标中创建了 112 个评估单元。EU 被分配到四个预测类别,大致对应于以前的 CASP 中的靶标难度类别:TBM(基于模板的建模,容易或困难)、FM(自由建模)和 TBM/FM 重叠类别。超过三分之一的 CASP15 EU 属于历史上最具挑战性的 FM 类别,在该类别中无法检测到与已知折叠蛋白质的同源性或结构相似性。