Bioinformatics Research Center, Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27606, United States.
Immunity, Inflammation, and Disease Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina 27709, United States.
Anal Chem. 2024 Oct 8;96(40):15970-15979. doi: 10.1021/acs.analchem.4c03256. Epub 2024 Sep 18.
Nontargeted analysis (NTA) is increasingly utilized for its ability to identify key molecular features beyond known targets in complex samples. NTA is particularly advantageous in exploratory studies aimed at identifying phenotype-associated features or molecules able to classify various sample types. However, implementing NTA involves extensive data analyses and labor-intensive annotations. To address these limitations, we developed a rapid data screening capability compatible with NTA data collected on a liquid chromatography, ion mobility spectrometry, and mass spectrometry (LC-IMS-MS) platform that allows for sample classification while highlighting potential features of interest. Specifically, this method aggregates the thousands of IMS-MS spectra collected across the LC space for each sample and collapses the LC dimension, resulting in a single summed IMS-MS spectrum for screening. The summed IMS-MS spectra are then analyzed with a bootstrapped Lasso technique to identify key regions or coordinates for phenotype classification via support vector machines. Molecular annotations are then performed by examining the features present in the selected coordinates, highlighting potential molecular candidates. To demonstrate this summed IMS-MS screening approach, we applied it to clinical plasma lipidomic NTA data and exposomic NTA data from water sites with varying contaminant levels. Distinguishing coordinates were observed in both studies, enabling the evaluation of phenotypic molecular annotations and resulting in screening models capable of classifying samples with up to a 25% increase in accuracy compared to models using annotated data.
非靶向分析(NTA)越来越多地被用于识别复杂样品中已知靶点以外的关键分子特征。NTA 在旨在识别与表型相关的特征或能够对各种样本类型进行分类的分子的探索性研究中特别有优势。然而,实施 NTA 涉及广泛的数据分析和劳动密集型注释。为了解决这些限制,我们开发了一种快速数据筛选功能,与在液相色谱、离子淌度质谱和质谱 (LC-IMS-MS) 平台上收集的 NTA 数据兼容,允许在突出潜在感兴趣特征的同时对样品进行分类。具体来说,该方法聚合了每个样品在 LC 空间中收集的数千个 IMS-MS 光谱,并压缩 LC 维度,从而为筛选生成单个总和 IMS-MS 光谱。然后,使用自举套索技术对总和 IMS-MS 光谱进行分析,通过支持向量机识别用于表型分类的关键区域或坐标。然后通过检查所选坐标中存在的特征来执行分子注释,突出潜在的分子候选物。为了展示这种总和 IMS-MS 筛选方法,我们将其应用于临床血浆脂质组学 NTA 数据和具有不同污染物水平的水点的暴露组学 NTA 数据。在这两项研究中都观察到了有区别的坐标,从而能够评估表型分子注释,并产生能够将样本分类的筛选模型,与使用注释数据的模型相比,准确性提高了 25%。