University of Chinese Academy of Sciences, Beijing 100049, China; iHuman Institute, ShanghaiTech University, Shanghai 201210, China; Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
University of Chinese Academy of Sciences, Beijing 100049, China; iHuman Institute, ShanghaiTech University, Shanghai 201210, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
Anal Chim Acta. 2018 Oct 31;1029:50-57. doi: 10.1016/j.aca.2018.05.001. Epub 2018 May 4.
Data analysis represents a key challenge for untargeted metabolomics studies and it commonly requires extensive processing of more than thousands of metabolite peaks included in raw high-resolution MS data. Although a number of software packages have been developed to facilitate untargeted data processing, they have not been comprehensively scrutinized in the capability of feature detection, quantification and marker selection using a well-defined benchmark sample set. In this study, we acquired a benchmark dataset from standard mixtures consisting of 1100 compounds with specified concentration ratios including 130 compounds with significant variation of concentrations. Five software evaluated here (MS-Dial, MZmine 2, XCMS, MarkerView, and Compound Discoverer) showed similar performance in detection of true features derived from compounds in the mixtures. However, significant differences between untargeted metabolomics software were observed in relative quantification of true features in the benchmark dataset. MZmine 2 outperformed the other software in terms of quantification accuracy and it reported the most true discriminating markers together with the fewest false markers. Furthermore, we assessed selection of discriminating markers by different software using both the benchmark dataset and a real-case metabolomics dataset to propose combined usage of two software for increasing confidence of biomarker identification. Our findings from comprehensive evaluation of untargeted metabolomics software would help guide future improvements of these widely used bioinformatics tools and enable users to properly interpret their metabolomics results.
数据分析是无靶向代谢组学研究的一个关键挑战,通常需要对原始高分辨率 MS 数据中包含的超过数千个代谢物峰进行大量处理。尽管已经开发了许多软件包来方便无靶向数据处理,但它们在使用定义明确的基准样本集进行特征检测、定量和标志物选择方面的能力尚未得到全面审查。在这项研究中,我们从由 1100 种化合物组成的标准混合物中获得了一个基准数据集,这些化合物具有指定的浓度比,其中包括浓度变化较大的 130 种化合物。这里评估的五种软件(MS-Dial、MZmine 2、XCMS、MarkerView 和 Compound Discoverer)在检测来自混合物中化合物的真实特征方面表现出相似的性能。然而,在基准数据集中,无靶向代谢组学软件在真实特征的相对定量方面存在显著差异。MZmine 2 在定量准确性方面优于其他软件,它报告了最多的真正区分标志物和最少的假标志物。此外,我们使用基准数据集和实际代谢组学数据集评估了不同软件对区分标志物的选择,提出了两种软件的联合使用,以增加生物标志物鉴定的可信度。我们对无靶向代谢组学软件的全面评估结果将有助于指导这些广泛使用的生物信息学工具的未来改进,并使用户能够正确解释他们的代谢组学结果。