Lange Eva, Tautenhahn Ralf, Neumann Steffen, Gröpl Clemens
Beatson Institute for Cancer Research, Proteomics and Mass Spectrometry Group, Scotland, UK.
BMC Bioinformatics. 2008 Sep 15;9:375. doi: 10.1186/1471-2105-9-375.
Liquid chromatography coupled to mass spectrometry (LC-MS) has become a prominent tool for the analysis of complex proteomics and metabolomics samples. In many applications multiple LC-MS measurements need to be compared, e. g. to improve reliability or to combine results from different samples in a statistical comparative analysis. As in all physical experiments, LC-MS data are affected by uncertainties, and variability of retention time is encountered in all data sets. It is therefore necessary to estimate and correct the underlying distortions of the retention time axis to search for corresponding compounds in different samples. To this end, a variety of so-called LC-MS map alignment algorithms have been developed during the last four years. Most of these approaches are well documented, but they are usually evaluated on very specific samples only. So far, no publication has been assessing different alignment algorithms using a standard LC-MS sample along with commonly used quality criteria.
We propose two LC-MS proteomics as well as two LC-MS metabolomics data sets that represent typical alignment scenarios. Furthermore, we introduce a new quality measure for the evaluation of LC-MS alignment algorithms. Using the four data sets to compare six freely available alignment algorithms proposed for the alignment of metabolomics and proteomics LC-MS measurements, we found significant differences with respect to alignment quality, running time, and usability in general.
The multitude of available alignment methods necessitates the generation of standard data sets and quality measures that allow users as well as developers to benchmark and compare their map alignment tools on a fair basis. Our study represents a first step in this direction. Currently, the installation and evaluation of the "correct" parameter settings can be quite a time-consuming task, and the success of a particular method is still highly dependent on the experience of the user. Therefore, we propose to continue and extend this type of study to a community-wide competition. All data as well as our evaluation scripts are available at http://msbi.ipb-halle.de/msbi/caap.
液相色谱-质谱联用(LC-MS)已成为分析复杂蛋白质组学和代谢组学样品的重要工具。在许多应用中,需要比较多次LC-MS测量结果,例如提高可靠性或在统计比较分析中合并来自不同样品的结果。与所有物理实验一样,LC-MS数据会受到不确定性的影响,并且在所有数据集中都会遇到保留时间的变异性。因此,有必要估计并校正保留时间轴的潜在偏差,以便在不同样品中寻找相应的化合物。为此,在过去四年中开发了各种所谓的LC-MS图谱比对算法。这些方法大多有详细记录,但通常仅在非常特定的样品上进行评估。到目前为止,还没有出版物使用标准LC-MS样品以及常用的质量标准来评估不同的比对算法。
我们提出了两个LC-MS蛋白质组学数据集和两个LC-MS代谢组学数据集,它们代表了典型的比对场景。此外,我们引入了一种新的质量指标来评估LC-MS比对算法。使用这四个数据集来比较为代谢组学和蛋白质组学LC-MS测量比对而提出的六种免费可用的比对算法,我们发现总体上在比对质量、运行时间和可用性方面存在显著差异。
众多可用的比对方法需要生成标准数据集和质量指标,以便用户和开发者能够在公平的基础上对他们的图谱比对工具进行基准测试和比较。我们的研究代表了朝这个方向迈出的第一步。目前,安装和评估“正确的”参数设置可能是一项相当耗时的任务,并且特定方法的成功仍然高度依赖于用户的经验。因此,我们建议继续并将这类研究扩展为全社区范围的竞赛。所有数据以及我们的评估脚本可在http://msbi.ipb-halle.de/msbi/caap获取。