Biomolecular Medicine, Department of Surgery & Cancer, Faculty of Medicine, Sir Alexander Fleming Building, Imperial College London, SW7 2AZ, United Kingdom.
Anal Chem. 2010 Sep 1;82(17):7319-28. doi: 10.1021/ac101278x.
It has long been recognized that estimates of isotopic abundance patterns may be instrumental in identifying the many unknown compounds encountered when conducting untargeted metabolic profiling using liquid chromatography/mass spectrometry. While numerous methods have been developed for assigning heuristic scores to rank the degree of fit of the observed abundance patterns with theoretical ones, little work has been done to quantify the errors that are associated with the measurements made. Thus, it is generally not possible to determine, in a statistically meaningful manner, whether a given chemical formula would likely be capable of producing the observed data. In this paper, we present a method for constructing confidence regions for the isotopic abundance patterns based on the fundamental distribution of the ion arrivals. Moreover, we develop a method for doing so that makes use of the information pooled together from the measurements obtained across an entire chromatographic peak, as well as from any adducts, dimers, and fragments observed in the mass spectra. This greatly increases the statistical power, thus enabling the analyst to rule out a potentially much larger number of candidate formulas while explicitly guarding against false positives. In practice, small departures from the model assumptions are possible due to detector saturation and interferences between adjacent isotopologues. While these factors form impediments to statistical rigor, they can to a large extent be overcome by restricting the analysis to moderate ion counts and by applying robust statistical methods. Using real metabolic data, we demonstrate that the method is capable of reducing the number of candidate formulas by a substantial amount, even when no bromine or chlorine atoms are present. We argue that further developments in our ability to characterize the data mathematically could enable much more powerful statistical analyses.
长期以来,人们已经认识到,在使用液相色谱/质谱进行非靶向代谢物分析时,对同位素丰度模式的估计可能有助于识别遇到的许多未知化合物。虽然已经开发了许多方法来对观察到的丰度模式与理论模式的拟合程度进行启发式评分,但很少有工作来量化与测量相关的误差。因此,通常不可能以统计上有意义的方式确定给定的化学式是否有可能产生所观察到的数据。在本文中,我们提出了一种基于离子到达的基本分布来构建同位素丰度模式置信区间的方法。此外,我们还开发了一种方法,该方法利用从整个色谱峰的测量中以及在质谱中观察到的任何加合物、二聚体和碎片中汇集的信息来实现这一点。这大大提高了统计能力,从而使分析人员能够排除潜在的更多候选公式,同时明确防止误报。在实践中,由于检测器饱和和相邻同位素之间的干扰,可能会出现与模型假设的小偏差。虽然这些因素对统计严谨性构成了障碍,但通过将分析限制在中等离子计数并应用稳健的统计方法,在很大程度上可以克服这些因素。使用真实的代谢数据,我们证明即使没有溴或氯原子存在,该方法也能够大大减少候选公式的数量。我们认为,我们在对数据进行数学描述的能力方面的进一步发展可以实现更强大的统计分析。