Hauschild Anne-Christin, Kopczynski Dominik, D'Addario Marianna, Baumbach Jörg Ingo, Rahmann Sven, Baumbach Jan
Computational Systems Biology Group, Max Planck Institute for Informatics, Saarbrücken, Germany.
Computer Science XI and Collaborative Research Center SFB 876, TU Dortmund, Germany.
Metabolites. 2013 Apr 16;3(2):277-93. doi: 10.3390/metabo3020277.
Ion mobility spectrometry with pre-separation by multi-capillary columns (MCC/IMS) has become an established inexpensive, non-invasive bioanalytics technology for detecting volatile organic compounds (VOCs) with various metabolomics applications in medical research. To pave the way for this technology towards daily usage in medical practice, different steps still have to be taken. With respect to modern biomarker research, one of the most important tasks is the automatic classification of patient-specific data sets into different groups, healthy or not, for instance. Although sophisticated machine learning methods exist, an inevitable preprocessing step is reliable and robust peak detection without manual intervention. In this work we evaluate four state-of-the-art approaches for automated IMS-based peak detection: local maxima search, watershed transformation with IPHEx, region-merging with VisualNow, and peak model estimation (PME).We manually generated Metabolites 2013, 3 278 a gold standard with the aid of a domain expert (manual) and compare the performance of the four peak calling methods with respect to two distinct criteria. We first utilize established machine learning methods and systematically study their classification performance based on the four peak detectors' results. Second, we investigate the classification variance and robustness regarding perturbation and overfitting. Our main finding is that the power of the classification accuracy is almost equally good for all methods, the manually created gold standard as well as the four automatic peak finding methods. In addition, we note that all tools, manual and automatic, are similarly robust against perturbations. However, the classification performance is more robust against overfitting when using the PME as peak calling preprocessor. In summary, we conclude that all methods, though small differences exist, are largely reliable and enable a wide spectrum of real-world biomedical applications.
采用多毛细管柱预分离的离子迁移谱法(MCC/IMS)已成为一种成熟的、廉价的、非侵入性生物分析技术,用于检测挥发性有机化合物(VOCs),在医学研究中有各种代谢组学应用。为使该技术在医学实践中得到日常应用,仍需采取不同步骤。就现代生物标志物研究而言,最重要的任务之一是将患者特定数据集自动分类为不同组,例如健康组或非健康组。尽管存在复杂的机器学习方法,但一个不可避免的预处理步骤是在无需人工干预的情况下进行可靠且稳健的峰检测。在这项工作中,我们评估了四种基于离子迁移谱的自动峰检测的先进方法:局部最大值搜索、使用IPHEx的分水岭变换、使用VisualNow的区域合并以及峰模型估计(PME)。我们借助领域专家手动生成了一个金标准(手动),并根据两个不同标准比较了四种峰检测方法的性能。我们首先利用已建立的机器学习方法,并基于四种峰检测器的结果系统地研究它们的分类性能。其次,我们研究了关于扰动和过拟合的分类方差和稳健性。我们的主要发现是所有方法(手动创建的金标准以及四种自动峰检测方法)在分类准确性方面的能力几乎同样出色。此外,我们注意到所有工具(手动和自动)对扰动的稳健性相似。然而,当使用PME作为峰检测预处理器时,分类性能对过拟合更具稳健性。总之,我们得出结论,所有方法虽然存在细微差异,但在很大程度上是可靠的,并且能够实现广泛的实际生物医学应用。