Abriata Luciano A, Kinch Lisa N, Tamò Giorgio E, Monastyrskyy Bohdan, Kryshtafovych Andriy, Dal Peraro Matteo
Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas.
Proteins. 2018 Mar;86 Suppl 1:16-26. doi: 10.1002/prot.25403. Epub 2017 Oct 24.
For assessment purposes, CASP targets are split into evaluation units. We herein present the official definition of CASP12 evaluation units (EUs) and their classification into difficulty categories. Each target can be evaluated as one EU (the whole target) or/and several EUs (separate structural domains or groups of structural domains). The specific scenario for a target split is determined by the domain organization of available templates, the difference in server performance on separate domains versus combination of the domains, and visual inspection. In the end, 71 targets were split into 96 EUs. Classification of the EUs into difficulty categories was done semi-automatically with the assistance of metrics provided by the Prediction Center. These metrics account for sequence and structural similarities of the EUs to potential structural templates from the Protein Data Bank, and for the baseline performance of automated server predictions. The metrics readily separate the 96 EUs into 38 EUs that should be straightforward for template-based modeling (TBM) and 39 that are expected to be hard for homology modeling and are thus left for free modeling (FM). The remaining 19 borderline evaluation units were dubbed FM/TBM, and were inspected case by case. The article also overviews structural and evolutionary features of selected targets relevant to our accompanying article presenting the assessment of FM and FM/TBM predictions, and overviews structural features of the hardest evaluation units from the FM category. We finally suggest improvements for the EU definition and classification procedures.
出于评估目的,CASP目标被划分为评估单元。我们在此给出CASP12评估单元(EUs)的官方定义及其难度分类。每个目标可以作为一个评估单元(整个目标)或/和几个评估单元(单独的结构域或结构域组)进行评估。目标拆分的具体情况由可用模板的结构域组织、服务器在单独结构域与结构域组合上的性能差异以及目视检查来确定。最终,71个目标被拆分为96个评估单元。评估单元的难度分类在预测中心提供的指标辅助下半自动完成。这些指标考虑了评估单元与蛋白质数据库中潜在结构模板的序列和结构相似性,以及自动化服务器预测的基线性能。这些指标很容易将96个评估单元分为38个对于基于模板的建模(TBM)来说应该很简单的评估单元和39个对于同源建模来说预计很难的评估单元,因此留作自由建模(FM)。其余19个临界评估单元被称为FM/TBM,并逐案进行检查。本文还概述了与我们随附文章中介绍FM和FM/TBM预测评估相关的选定目标的结构和进化特征,并概述了FM类别中最难的评估单元的结构特征。我们最后对评估单元的定义和分类程序提出改进建议。