Departments of Medicine and Pharmacology, University of California San Diego , La Jolla, California 92093, United States.
Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School , Boston, Massachusetts 02115, United States.
Anal Chem. 2017 Feb 7;89(3):1399-1404. doi: 10.1021/acs.analchem.6b04337. Epub 2017 Jan 26.
Untargeted liquid-chromatography-mass spectrometry (LC-MS)-based metabolomics analysis of human biospecimens has become among the most promising strategies for probing the underpinnings of human health and disease. Analysis of spectral data across population scale cohorts, however, is precluded by day-to-day nonlinear signal drifts in LC retention time or batch effects that complicate comparison of thousands of untargeted peaks. To date, there exists no efficient means of visualization and quantitative assessment of signal drift, correction of drift when present, and automated filtering of unstable spectral features, particularly across thousands of data files in population scale experiments. Herein, we report the development of a set of R-based scripts that allow for pre- and postprocessing of raw LC-MS data. These methods can be integrated with existing data analysis workflows by providing initial preprocessing bulk nonlinear retention time correction at the raw data level. Further, this approach provides postprocessing visualization and quantification of peak alignment accuracy, as well as peak-reliability-based parsing of processed data through hierarchical clustering of signal profiles. In a metabolomics data set derived from ∼3000 human plasma samples, we find that application of our alignment tools resulted in substantial improvement in peak alignment accuracy, automated data filtering, and ultimately statistical power for detection of metabolite correlates of clinical measures. These tools will enable metabolomics studies of population scale cohorts.
基于非靶向液相色谱-质谱(LC-MS)的人生物样本代谢组学分析已成为探索人类健康和疾病基础的最有前途的策略之一。然而,由于 LC 保留时间的日常非线性信号漂移或批处理效应,使得对数千个非靶向峰进行比较变得复杂,因此无法在人群规模队列中分析光谱数据。迄今为止,还没有有效的方法来可视化和定量评估信号漂移、纠正存在的漂移以及自动过滤不稳定的光谱特征,特别是在人群规模实验中的数千个数据文件中。在此,我们报告了一组基于 R 的脚本的开发,这些脚本允许对原始 LC-MS 数据进行预处理和后处理。这些方法可以通过在原始数据级别提供初始预处理批量非线性保留时间校正来集成到现有的数据分析工作流程中。此外,该方法提供了峰对齐准确性的后处理可视化和量化,以及基于峰可靠性的处理后数据解析,通过信号谱图的层次聚类。在来自约 3000 个人血浆样本的代谢组学数据集,我们发现应用我们的对齐工具可显著提高峰对齐准确性、自动数据过滤,最终提高检测与临床测量相关代谢物的统计能力。这些工具将使人群规模队列的代谢组学研究成为可能。