Guo Jian, Shen Sam, Liu Min, Wang Chenjingyi, Low Brian, Chen Ying, Hu Yaxi, Xing Shipei, Yu Huaxu, Gao Yu, Fang Mingliang, Huan Tao
Department of Chemistry, Faculty of Science, University of British Columbia, Vancouver Campus, 2036 Main Mall, Vancouver, BC V6T 1Z1, Canada.
School of Civil and Environmental Engineering, Nanyang Technological University, Singapore 639798, Singapore.
Metabolites. 2022 Feb 26;12(3):212. doi: 10.3390/metabo12030212.
Extracting metabolic features from liquid chromatography-mass spectrometry (LC-MS) data has been a long-standing bioinformatic challenge in untargeted metabolomics. Conventional feature extraction algorithms fail to recognize features with low signal intensities, poor chromatographic peak shapes, or those that do not fit the parameter settings. This problem also poses a challenge for MS-based exposome studies, as low-abundant metabolic or exposomic features cannot be automatically recognized from raw data. To address this data processing challenge, we developed an R package, JPA (short for Joint Metabolomic Data Processing and Annotation), to comprehensively extract metabolic features from raw LC-MS data. JPA performs feature extraction by combining a conventional peak picking algorithm and strategies for (1) recognizing features with bad peak shapes but that have tandem mass spectra (MS) and (2) picking up features from a user-defined targeted list. The performance of JPA in global metabolomics was demonstrated using serial diluted urine samples, in which JPA was able to rescue an average of 25% of metabolic features that were missed by the conventional peak picking algorithm due to dilution. More importantly, the chromatographic peak shapes, analytical accuracy, and precision of the rescued metabolic features were all evaluated. Furthermore, owing to its sensitive feature extraction, JPA was able to achieve a limit of detection (LOD) that was up to thousands of folds lower when automatically processing metabolomics data of a serial diluted metabolite standard mixture analyzed in HILIC(-) and RP(+) modes. Finally, the performance of JPA in exposome research was validated using a mixture of 250 drugs and 255 pesticides at environmentally relevant levels. JPA detected an average of 2.3-fold more exposure compounds than conventional peak picking only.
从液相色谱 - 质谱(LC-MS)数据中提取代谢特征一直是无靶向代谢组学中一项长期存在的生物信息学挑战。传统的特征提取算法无法识别信号强度低、色谱峰形不佳或不符合参数设置的特征。这个问题也给基于质谱的暴露组研究带来了挑战,因为低丰度的代谢或暴露组特征无法从原始数据中自动识别出来。为了应对这一数据处理挑战,我们开发了一个R包JPA(联合代谢组学数据处理与注释的缩写),用于从原始LC-MS数据中全面提取代谢特征。JPA通过结合传统的峰识别算法以及以下策略来进行特征提取:(1)识别峰形不佳但具有串联质谱(MS)的特征;(2)从用户定义的目标列表中提取特征。使用系列稀释的尿液样本证明了JPA在全局代谢组学中的性能,在这些样本中,由于稀释,传统峰识别算法遗漏的代谢特征,JPA平均能够挽救25%。更重要的是,对挽救的代谢特征的色谱峰形、分析准确性和精密度都进行了评估。此外,由于其灵敏的特征提取能力,在自动处理以亲水作用色谱(HILIC(-))和反相色谱(RP(+))模式分析的系列稀释代谢物标准混合物的代谢组学数据时,JPA能够实现低至数千倍更低的检测限(LOD)。最后,使用环境相关水平的250种药物和255种农药的混合物验证了JPA在暴露组研究中的性能。与仅使用传统峰识别相比,JPA平均检测到的暴露化合物多2.3倍。