Department of Physics , The George Washington University , Washington , D.C. 20052 , United States.
Department of Chemistry and Biochemistry , University of Maryland , College Park , Maryland 20742 , United States.
Anal Chem. 2019 May 7;91(9):5768-5776. doi: 10.1021/acs.analchem.8b05985. Epub 2019 Apr 15.
Recent developments in high-resolution mass spectrometry (HRMS) technology enabled ultrasensitive detection of proteins, peptides, and metabolites in limited amounts of samples, even single cells. However, extraction of trace-abundance signals from complex data sets ( m/ z value, separation time, signal abundance) that result from ultrasensitive studies requires improved data processing algorithms. To bridge this gap, we here developed "Trace", a software framework that incorporates machine learning (ML) to automate feature selection and optimization for the extraction of trace-level signals from HRMS data. The method was validated using primary (raw) and manually curated data sets from single-cell metabolomic studies of the South African clawed frog ( Xenopus laevis) embryo using capillary electrophoresis electrospray ionization HRMS. We demonstrated that Trace combines sensitivity, accuracy, and robustness with high data processing throughput to recognize signals, including those previously identified as metabolites in single-cell capillary electrophoresis HRMS measurements that we conducted over several months. These performance metrics combined with a compatibility with MS data in open-source (mzML) format make Trace an attractive software resource to facilitate data analysis for studies employing ultrasensitive high-resolution MS.
近年来,高分辨率质谱(HRMS)技术的发展使得能够在有限的样品量(甚至单个细胞)中超灵敏地检测蛋白质、肽和代谢物。然而,从超灵敏研究产生的复杂数据集(m/z 值、分离时间、信号丰度)中提取痕量信号需要改进的数据处理算法。为了弥补这一差距,我们在这里开发了“Trace”,这是一个软件框架,它结合了机器学习(ML),以实现从 HRMS 数据中提取痕量信号的特征选择和优化自动化。该方法使用毛细管电泳电喷雾电离 HRMS 对南非爪蟾(Xenopus laevis)胚胎单细胞代谢组学研究的原始和手动整理数据集进行了验证。我们证明了 Trace 具有灵敏度、准确性和稳健性,并且具有较高的数据处理吞吐量,能够识别信号,包括我们在几个月内进行的单细胞毛细管电泳 HRMS 测量中先前被确定为代谢物的信号。这些性能指标与开源(mzML)格式的 MS 数据的兼容性使得 Trace 成为一个有吸引力的软件资源,可促进使用超灵敏高分辨率 MS 的研究中的数据分析。