Riquelme Gabriel, Zabalegui Nicolás, Marchi Pablo, Jones Christina M, Monge María Eugenia
Centro de Investigaciones en Bionanociencias (CIBION), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Godoy Cruz 2390, Ciudad de Buenos Aires C1425FQD, Argentina.
Departamento de Química Inorgánica Analítica y Química Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Universitaria, Buenos Aires C1428EGA, Argentina.
Metabolites. 2020 Oct 16;10(10):416. doi: 10.3390/metabo10100416.
Preprocessing data in a reproducible and robust way is one of the current challenges in untargeted metabolomics workflows. Data curation in liquid chromatography-mass spectrometry (LC-MS) involves the removal of biologically non-relevant features (retention time, pairs) to retain only high-quality data for subsequent analysis and interpretation. The present work introduces TidyMS, a package for the Python programming language for preprocessing LC-MS data for quality control (QC) procedures in untargeted metabolomics workflows. It is a versatile strategy that can be customized or fit for purpose according to the specific metabolomics application. It allows performing quality control procedures to ensure accuracy and reliability in LC-MS measurements, and it allows preprocessing metabolomics data to obtain cleaned matrices for subsequent statistical analysis. The capabilities of the package are shown with pipelines for an LC-MS system suitability check, system conditioning, signal drift evaluation, and data curation. These applications were implemented to preprocess data corresponding to a new suite of candidate plasma reference materials developed by the National Institute of Standards and Technology (NIST; hypertriglyceridemic, diabetic, and African-American plasma pools) to be used in untargeted metabolomics studies in addition to NIST SRM 1950 Metabolites in Frozen Human Plasma. The package offers a rapid and reproducible workflow that can be used in an automated or semi-automated fashion, and it is an open and free tool available to all users.
以可重复且稳健的方式预处理数据是当前非靶向代谢组学工作流程中的挑战之一。液相色谱 - 质谱联用(LC - MS)中的数据整理涉及去除生物学上不相关的特征(保留时间、峰对),仅保留高质量数据用于后续分析和解释。本工作介绍了TidyMS,这是一个用于Python编程语言的软件包,用于在非靶向代谢组学工作流程中预处理LC - MS数据以进行质量控制(QC)程序。这是一种通用策略,可以根据特定的代谢组学应用进行定制或适配。它允许执行质量控制程序以确保LC - MS测量的准确性和可靠性,并且允许预处理代谢组学数据以获得用于后续统计分析的清理后的矩阵。该软件包的功能通过用于LC - MS系统适用性检查、系统调节、信号漂移评估和数据整理的流程展示。这些应用被用于预处理与美国国家标准与技术研究院(NIST;高甘油三酯血症、糖尿病和非裔美国人血浆库)开发的一组新的候选血浆参考物质相对应的数据,除了NIST SRM 1950冷冻人血浆中的代谢物之外,这些数据将用于非靶向代谢组学研究。该软件包提供了一个快速且可重复的工作流程,可以以自动化或半自动化方式使用,并且它是一个可供所有用户使用的开放且免费的工具。