Grouw Alexandria Van, Rainey Markace A, Reid Olivia K, Ogle Molly M, Moore Samuel G, Temenoff Johnna S, Fernández Facundo M
School of Chemistry and Biochemistry, Georgia Institute of Technology, 901 Atlanta Drive, Atlanta, Georgia 30332, USA.
Systems Mass Spectrometry Core, Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, 315 Ferst Drive NW, Atlanta, Georgia Samuel, USA.
J Chem Inf Model. 2025 Feb 24;65(4):1826-1836. doi: 10.1021/acs.jcim.4c02040. Epub 2025 Feb 5.
Specificity, sensitivity, and high metabolite coverage make mass spectrometry (MS) one of the most valuable tools in metabolomics and lipidomics. However, translation of metabolomics MS methods to multiyear studies conducted across multiple batches is limited by variability in electrospray ionization response, making batch-to-batch comparisons challenging. This limitation creates an artificial divide between nontargeted discovery work that is broad in scope but limited in terms of absolute quantitation ability and targeted work that is highly accurate but limited in scope due to the need for matched isotopically labeled standards. These issues are often observed in stem cell studies using metabolomic and lipidomic MS approaches, where patient recruitment can be a years-long process and samples become available in discrete batches every few months. To bridge this gap, we developed a machine learning model that predicts electrospray ionization sensitivity for lipid classes that have shown correlation with stem cell potency. Molecular descriptors derived from these lipids' chemical structures are used as model input to predict electrospray response, enabling quantitation by MS with moderate accuracy (semiquantitation). Model performance was evaluated via internal and external validation using cultured cells from various stem cell donors, achieving global percent errors of 40% and 20% for positive and negative electrospray ion modes, respectively. Although this accuracy is typically insufficient for traditional targeted lipidomics experiments, it is sufficient for semiquantitative estimation of lipid marker concentrations across batches without the need for specific chemical standards that many times are unavailable. Furthermore, the precision for model-predicted concentrations was 16.9% for the positive mode and 7.5% for the negative mode, indicating promise for data harmonization across batches. The set of molecular descriptors used by the models described here was able to yield higher accuracy than those previously published in the literature, showing high promise toward semiquantitative lipidomics.
特异性、灵敏度和高代谢物覆盖率使质谱(MS)成为代谢组学和脂质组学中最有价值的工具之一。然而,将代谢组学MS方法应用于跨多个批次进行的多年研究受到电喷雾电离响应变异性的限制,使得批次间比较具有挑战性。这一限制在非靶向发现工作和靶向工作之间造成了人为的划分,前者范围广泛但绝对定量能力有限,后者高度准确但由于需要匹配的同位素标记标准品而范围有限。这些问题在使用代谢组学和脂质组学MS方法的干细胞研究中经常出现,在这类研究中,患者招募可能是一个长达数年的过程,样本每隔几个月以离散批次的形式获得。为了弥合这一差距,我们开发了一种机器学习模型,该模型可预测与干细胞潜能相关的脂质类别的电喷雾电离灵敏度。从这些脂质的化学结构衍生的分子描述符用作模型输入,以预测电喷雾响应,从而实现中等准确度(半定量)的MS定量。通过使用来自不同干细胞供体的培养细胞进行内部和外部验证来评估模型性能,正电喷雾离子模式和负电喷雾离子模式的全局百分比误差分别为40%和20%。虽然这种准确度通常不足以用于传统的靶向脂质组学实验,但足以在无需许多情况下无法获得的特定化学标准品的情况下,对批次间的脂质标记物浓度进行半定量估计。此外,模型预测浓度的正模式精密度为16.9%,负模式精密度为%,这表明在批次间数据协调方面具有前景。本文所述模型使用的分子描述符集能够产生比文献中先前发表的更高的准确度,显示出在半定量脂质组学方面的巨大前景。