串联质谱数据的存储库规模分类和解构。

Repository scale classification and decomposition of tandem mass spectral data.

机构信息

Computational Biology Department in the School of Computer Science, Carnegie Mellon University, Pittsburgh, USA.

出版信息

Sci Rep. 2021 Apr 15;11(1):8314. doi: 10.1038/s41598-021-87796-6.

DOI:10.1038/s41598-021-87796-6

PMID:33859284

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8050247/

Abstract

Various studies have shown associations between molecular features and phenotypes of biological samples. These studies, however, focus on a single phenotype per study and are not applicable to repository scale metabolomics data. Here we report MetSummarizer, a method for predicting (i) the biological phenotypes of environmental and host-oriented samples, and (ii) the raw ingredient composition of complex mixtures. We show that the aggregation of various metabolomic datasets can improve the accuracy of predictions. Since these datasets have been collected using different standards at various laboratories, in order to get unbiased results it is crucial to detect and discard standard-specific features during the classification step. We further report high accuracy in prediction of the raw ingredient composition of complex foods from the Global Foodomics Project.

摘要

各种研究表明，生物样本的分子特征与表型之间存在关联。然而，这些研究每一项都只关注一个表型，并不适用于存储库规模的代谢组学数据。在此，我们报告了 MetSummarizer 方法，用于预测（i）环境和宿主导向样本的生物学表型，以及（ii）复杂混合物的原始成分组成。我们表明，聚合各种代谢组学数据集可以提高预测的准确性。由于这些数据集是在不同实验室使用不同的标准收集的，因此为了获得无偏结果，在分类步骤中检测和丢弃特定于标准的特征至关重要。我们还报告了从全球食品组学项目中预测复杂食品的原始成分组成的高准确性。