Tamò Giorgio E, Abriata Luciano A, Fonti Giulia, Dal Peraro Matteo
Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
Proteins. 2018 Mar;86 Suppl 1:215-227. doi: 10.1002/prot.25442. Epub 2017 Dec 26.
Integrative modeling approaches attempt to combine experiments and computation to derive structure-function relationships in complex molecular assemblies. Despite their importance for the advancement of life sciences, benchmarking of existing methodologies is rather poor. The 12 round of the Critical Assessment of protein Structure Prediction (CASP) offered a unique niche to benchmark data and methods from two kinds of experiments often used in integrative modeling, namely residue-residue contacts obtained through crosslinking/mass-spectrometry (CLMS), and small-angle X-ray scattering (SAXS) experiments. Upon assessment of the models submitted by predictors for 3 targets assisted by CLMS data and 11 targets by SAXS data, we observed no significant improvement when compared to the best data-blind models, although most predictors did improve relative to their own data-blind predictions. Only for target Tx892 of the CLMS-assisted category and for target Ts947 of the SAXS-assisted category, there was a net, albeit mild, improvement relative to the best data-blind predictions. We discuss here possible reasons for the relatively poor success, which point rather to inconsistencies in the data sources rather than in the methods, to which a few groups were less sensitive. We conclude with suggestions that could improve the potential of data integration in future CASP rounds in terms of experimental data production, methods development, data management and prediction assessment.
整合建模方法试图将实验与计算相结合,以推导复杂分子组装体中的结构-功能关系。尽管它们对生命科学的发展至关重要,但现有方法的基准测试却相当薄弱。蛋白质结构预测关键评估(CASP)的第12轮提供了一个独特的契机,可对整合建模中常用的两类实验的数据和方法进行基准测试,这两类实验分别是通过交联/质谱(CLMS)获得的残基-残基接触以及小角X射线散射(SAXS)实验。在评估预测者提交的由CLMS数据辅助的3个靶标的模型和由SAXS数据辅助的11个靶标的模型时,我们发现与最佳的无数据模型相比没有显著改进,尽管大多数预测者相对于他们自己的无数据预测确实有所改进。仅对于CLMS辅助类别的靶标Tx892和SAXS辅助类别的靶标Ts947,相对于最佳的无数据预测有一个净的、尽管是轻微的改进。我们在此讨论成功相对较差的可能原因,这些原因更多地指向数据源的不一致而非方法的不一致,少数研究小组对这些不一致不太敏感。我们最后提出了一些建议,这些建议可以在实验数据生成、方法开发、数据管理和预测评估方面提高未来CASP轮次中数据整合的潜力。