Hosseini Ramtin, Hassanpour Neda, Liu Li-Ping, Hassoun Soha
Department of Computer Science, Tufts University, Medford, MA 02155, USA.
Metabolites. 2020 May 3;10(5):183. doi: 10.3390/metabo10050183.
: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. : To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures metabolomics measurements and the biological network for the biological sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives probabilistic annotations, which assign chemical identities to measurements. Unlike prior pathway analysis tools that analyze differentially active pathways, PUMA defines a pathway as if the likelihood that the path generated the observed measurements is above a particular (user-defined) threshold. Due to the lack of "ground truth" metabolomics datasets, where all measurements are annotated and pathway activities are known, PUMA is validated on synthetic datasets that are designed to mimic cellular processes. PUMA, on average, outperforms pathway enrichment analysis by 8%. PUMA is applied to two case studies. PUMA suggests many biological meaningful pathways as active. Annotation results were in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates many measurements, suggesting 23 chemical identities for metabolites that were previously only identified as isomers, and a significant number of additional putative annotations over spectral database lookups. For an experimentally validated 50-compound dataset, annotations using PUMA yielded 0.833 precision and 0.676 recall.
非靶向代谢组学全面表征小分子,并阐明生物样品中生化途径的活性。尽管计算技术有所进步,但解读所收集的测量数据并确定其生物学作用仍然是一项挑战。
为了解读测量数据,我们提出了一种基于推理的方法,称为非靶向代谢组学分析的概率建模(PUMA)。我们的方法在生成模型中捕捉所研究生物样品的代谢组学测量数据和生物网络,并使用随机抽样来计算后验概率分布。PUMA预测途径活跃的可能性,然后得出概率注释,为测量数据赋予化学身份。与先前分析差异活跃途径的途径分析工具不同,PUMA将一条途径定义为该途径产生观察到的测量数据的可能性高于特定(用户定义)阈值的情况。由于缺乏“真实”的代谢组学数据集(其中所有测量数据都有注释且途径活性已知),PUMA在旨在模拟细胞过程的合成数据集上进行了验证。平均而言,PUMA的性能比途径富集分析高出8%。PUMA被应用于两个案例研究。PUMA表明许多具有生物学意义的途径是活跃的。注释结果与使用其他利用光谱特征形式的附加信息的工具所获得的结果一致。重要的是,PUMA为许多测量数据进行了注释,为以前仅被鉴定为异构体的代谢物提出了23种化学身份,并且在光谱数据库查找之外还有大量额外的推定注释。对于一个经过实验验证的50化合物数据集,使用PUMA进行注释的精确率为0.833,召回率为0.676。