Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, British Columbia V6T 1Z1, Canada.
Anal Chem. 2022 Mar 15;94(10):4260-4268. doi: 10.1021/acs.analchem.1c04758. Epub 2022 Mar 4.
Choosing appropriate data processing parameters is critical in processing liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics data. The conventional design of experiments (DOE) approach is time-consuming and provides no intuitive explanation why the selected parameters generate the best results. After studying commonly used metabolomics data processing software, this work summarized a set of universal parameters, including mass tolerance, peak height, peak width, and instrumental shift. These universal parameters are shared among different feature extraction programs and are critical to metabolic feature extraction. We then developed Paramounter, an R program that automatically measures these universal parameters from raw LC-MS-based metabolomics data prior to metabolic feature extraction. This is made possible through novel concepts of rank-based intensity sorting, zone of interest, and many others. Paramounter also translates universal parameters to software-specific parameters for data processing in different programs. Applying Paramounter is demonstrated to provide a threefold increase in the extracted metabolites compared to using default parameters in MS-DIAL-based feature extraction. Furthermore, the comparison between Paramounter, AutoTuner, and IPO showed that Paramounter generates 3.7- and 1.6-fold more true positive features than AutoTuner and IPO, respectively. Further validation of Paramounter on 11 datasets covering different sample types, data acquisition modes, and MS vendors proved that Paramounter is a convenient and robust program. Overall, the proposed universal parameters and the development of Paramounter address a critical need in metabolomics data processing, transforming metabolomics feature extraction from a "black box" to a "white box." Paramounter is freely available on GitHub (https://github.com/HuanLab/Paramounter).
选择合适的数据处理参数对于基于液相色谱-质谱(LC-MS)的非靶向代谢组学数据的处理至关重要。传统的实验设计(DOE)方法既耗时又无法直观地解释为什么所选参数会产生最佳结果。在研究了常用的代谢组学数据处理软件后,本工作总结了一组通用参数,包括质量容限、峰高、峰宽和仪器漂移。这些通用参数在不同的特征提取程序中共享,对于代谢特征提取至关重要。然后,我们开发了 Paramounter,这是一个 R 程序,可以在进行代谢特征提取之前,自动从基于 LC-MS 的原始代谢组学数据中测量这些通用参数。这是通过基于排名的强度排序、感兴趣区域等新概念实现的。Paramounter 还将通用参数转换为不同程序中数据处理的特定软件参数。应用 Paramounter 可将提取的代谢物数量与基于 MS-DIAL 的特征提取中使用默认参数相比增加三倍。此外,Paramounter、AutoTuner 和 IPO 的比较表明,Paramounter 生成的真实阳性特征比 AutoTuner 和 IPO 分别多 3.7 倍和 1.6 倍。在涵盖不同样本类型、数据采集模式和 MS 供应商的 11 个数据集上对 Paramounter 的进一步验证证明,Paramounter 是一个方便且稳健的程序。总体而言,所提出的通用参数和 Paramounter 的开发满足了代谢组学数据处理的关键需求,将代谢组学特征提取从“黑盒”转变为“白盒”。Paramounter 可在 GitHub(https://github.com/HuanLab/Paramounter)上免费获取。