Department of Molecular & Cellular Biochemistry, University of Kentucky, Lexington, KY, USA.
Markey Cancer Center, University of Kentucky, Lexington, KY, USA.
Metabolomics. 2018 Sep 17;14(10):125. doi: 10.1007/s11306-018-1426-9.
Direct injection Fourier-transform mass spectrometry (FT-MS) allows for the high-throughput and high-resolution detection of thousands of metabolite-associated isotopologues. However, spectral artifacts can generate large numbers of spectral features (peaks) that do not correspond to known compounds. Misassignment of these artifactual features creates interpretive errors and limits our ability to discern the role of representative features within living systems.
Our goal is to develop rigorous methods that identify and handle spectral artifacts within the context of high-throughput FT-MS-based metabolomics studies.
We observed three types of artifacts unique to FT-MS that we named high peak density (HPD) sites: fuzzy sites, ringing and partial ringing. While ringing artifacts are well-known, fuzzy sites and partial ringing have not been previously well-characterized in the literature. We developed new computational methods based on comparisons of peak density within a spectrum to identify regions of spectra with fuzzy sites. We used these methods to identify and eliminate fuzzy site artifacts in an example dataset of paired cancer and non-cancer lung tissue samples and evaluated the impact of these artifacts on classification accuracy and robustness.
Our methods robustly identified consistent fuzzy site artifacts in our FT-MS metabolomics spectral data. Without artifact identification and removal, 91.4% classification accuracy was achieved on an example lung cancer dataset; however, these classifiers rely heavily on artifactual features present in fuzzy sites. Proper removal of fuzzy site artifacts produces a more robust classifier based on non-artifactual features, with slightly improved accuracy of 92.4% in our example analysis.
直接进样傅里叶变换质谱(FT-MS)可实现高通量和高分辨率检测数千种代谢物相关的同位素。然而,光谱伪影会产生大量与已知化合物不对应的光谱特征(峰)。这些人为特征的错误分配会导致解释错误,并限制我们辨别代表活体内系统的特征的能力。
我们的目标是开发严格的方法,以在高通量基于 FT-MS 的代谢组学研究的背景下识别和处理光谱伪影。
我们观察到三种独特的 FT-MS 伪影,我们将其命名为高峰密度(HPD)位点:模糊位点、振铃和部分振铃。虽然振铃伪影是众所周知的,但模糊位点和部分振铃在文献中以前没有很好地描述过。我们开发了新的基于比较谱内峰密度的计算方法来识别具有模糊位点的谱区域。我们使用这些方法在配对的癌症和非癌症肺组织样本的示例数据集来识别和消除模糊位点伪影,并评估这些伪影对分类准确性和稳健性的影响。
我们的方法在 FT-MS 代谢组学光谱数据中可靠地识别了一致的模糊位点伪影。如果没有伪影的识别和去除,在一个示例肺癌数据集上可以实现 91.4%的分类准确性;然而,这些分类器严重依赖于模糊位点中存在的人为特征。适当去除模糊位点伪影可产生更稳健的基于非人为特征的分类器,在我们的示例分析中准确度略有提高,达到 92.4%。