Chang Christine H, Schwartz Sydney C, Im Alexandria K, Bloodsworth Kent J, Webb-Robertson Bobbie-Jo M, Ewing Robert G, Metz Thomas O, Ross Dylan H
Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99354, United States.
Anal Chem. 2025 Jul 8;97(26):13861-13871. doi: 10.1021/acs.analchem.5c01067. Epub 2025 Jun 25.
Identification of compounds with minimal ambiguity remains a central challenge in mass spectrometry-based metabolomics. Conventional compound identification relies on comparing analytical signatures (e.g., mass-to-charge ratio, collision cross section, tandem mass spectra) against reference data obtained from measurements of authentic chemical standards. The breadth of annotatable compounds using this approach is necessarily limited by availability of authentic standards, analytical throughput, and resolving power of the separations that underly the measurements. The maturation of computational methods, both theory-driven and artificial intelligence/machine learning-based, for prediction of various molecular properties relevant to multidimensional mass spectrometry measurements has opened the door to a new "reference-free" paradigm of compound annotation. Through augmenting existing reference data for molecular properties with computational predictions, the universe of identifiable chemical species can be expanded significantly beyond its current limits. An unexplored aspect of this novel approach is understanding how to gauge confidence in resulting annotations, especially as the compound search space is expanded. Intuitively, the confidence of a compound annotation is related to the inherent discriminatory power of the molecular properties used for identification, as well as the precision with which the properties are measured or predicted. In this work, we characterize this relationship between measurement precision and identification probability in a systematic and quantitative fashion for a defined region of chemical space that includes organic small molecule metabolites. Importantly, this work establishes a framework for conducting metabolite identification probability analysis that enables others to quantify this relationship for their own compounds and properties of interest.
在基于质谱的代谢组学中,识别具有最小歧义的化合物仍然是一个核心挑战。传统的化合物识别依赖于将分析特征(例如质荷比、碰撞截面、串联质谱)与从真实化学标准品测量中获得的参考数据进行比较。使用这种方法可注释化合物的广度必然受到真实标准品的可用性、分析通量以及测量所依据的分离的分辨率的限制。用于预测与多维质谱测量相关的各种分子性质的计算方法的成熟,无论是理论驱动的还是基于人工智能/机器学习的,都为化合物注释的新“无参考”范式打开了大门。通过用计算预测增强现有的分子性质参考数据,可识别化学物种的范围可以显著扩展到超出其当前限制。这种新方法一个未被探索的方面是理解如何评估所得注释的可信度,尤其是在化合物搜索空间扩大的情况下。直观地说,化合物注释的可信度与用于识别的分子性质的固有区分能力以及性质测量或预测的精度有关。在这项工作中,我们以系统和定量的方式表征了化学空间中一个定义区域(包括有机小分子代谢物)的测量精度与识别概率之间的这种关系。重要的是,这项工作建立了一个进行代谢物识别概率分析的框架,使其他人能够为他们自己感兴趣的化合物和性质量化这种关系。