Gatto Laurent, Hansen Kasper D, Hoopmann Michael R, Hermjakob Henning, Kohlbacher Oliver, Beyer Andreas
Computational Proteomics Unit and Cambridge Centre for Proteomics, University of Cambridge , Cambridge CB2 1QR, United Kingdom.
Department of Biostatistics, Johns Hopkins University , Baltimore, Maryland 21205, United States.
J Proteome Res. 2016 Mar 4;15(3):809-14. doi: 10.1021/acs.jproteome.5b00852. Epub 2015 Nov 17.
High-throughput methods based on mass spectrometry (proteomics, metabolomics, lipidomics, etc.) produce a wealth of data that cannot be analyzed without computational methods. The impact of the choice of method on the overall result of a biological study is often underappreciated, but different methods can result in very different biological findings. It is thus essential to evaluate and compare the correctness and relative performance of computational methods. The volume of the data as well as the complexity of the algorithms render unbiased comparisons challenging. This paper discusses some problems and challenges in testing and validation of computational methods. We discuss the different types of data (simulated and experimental validation data) as well as different metrics to compare methods. We also introduce a new public repository for mass spectrometric reference data sets ( http://compms.org/RefData ) that contains a collection of publicly available data sets for performance evaluation for a wide range of different methods.
基于质谱的高通量方法(蛋白质组学、代谢组学、脂质组学等)会产生大量数据,如果没有计算方法就无法进行分析。方法的选择对生物学研究整体结果的影响常常未得到充分重视,但不同方法可能会得出截然不同的生物学发现。因此,评估和比较计算方法的正确性及相对性能至关重要。数据量以及算法的复杂性使得无偏比较具有挑战性。本文讨论了计算方法测试与验证中的一些问题和挑战。我们讨论了不同类型的数据(模拟和实验验证数据)以及比较方法的不同指标。我们还介绍了一个新的质谱参考数据集公共存储库(http://compms.org/RefData),其中包含一系列公开可用的数据集,用于评估各种不同方法的性能。