利用已有知识和机器学习方法提高高分辨率 LC/MS 代谢组学数据中的峰检测。

Improving peak detection in high-resolution LC/MS metabolomics data using preexisting knowledge and machine learning approach.

机构信息

Department of Biostatistics and Bioinformatics, Rollins School of Public Health and Department of Medicine, School of Medicine, Emory University, Atlanta, GA 30322, USA.

出版信息

Bioinformatics. 2014 Oct 15;30(20):2941-8. doi: 10.1093/bioinformatics/btu430. Epub 2014 Jul 7.

DOI:10.1093/bioinformatics/btu430

PMID:25005748

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4184266/

Abstract

MOTIVATION

Peak detection is a key step in the preprocessing of untargeted metabolomics data generated from high-resolution liquid chromatography-mass spectrometry (LC/MS). The common practice is to use filters with predetermined parameters to select peaks in the LC/MS profile. This rigid approach can cause suboptimal performance when the choice of peak model and parameters do not suit the data characteristics.

RESULTS

Here we present a method that learns directly from various data features of the extracted ion chromatograms (EICs) to differentiate between true peak regions from noise regions in the LC/MS profile. It utilizes the knowledge of known metabolites, as well as robust machine learning approaches. Unlike currently available methods, this new approach does not assume a parametric peak shape model and allows maximum flexibility. We demonstrate the superiority of the new approach using real data. Because matching to known metabolites entails uncertainties and cannot be considered a gold standard, we also developed a probabilistic receiver-operating characteristic (pROC) approach that can incorporate uncertainties.

AVAILABILITY AND IMPLEMENTATION

The new peak detection approach is implemented as part of the apLCMS package available at http://web1.sph.emory.edu/apLCMS/ CONTACT: tyu8@emory.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

峰检测是从高分辨率液相色谱-质谱 (LC/MS) 生成的非靶向代谢组学数据预处理的关键步骤。常见的做法是使用具有预定参数的滤波器来选择 LC/MS 图谱中的峰。当峰模型和参数的选择不适用于数据特征时，这种刚性方法可能会导致性能不佳。

结果

在这里，我们提出了一种直接从提取离子色谱图 (EIC) 的各种数据特征中学习的方法，以区分 LC/MS 图谱中真实峰区域和噪声区域。它利用了已知代谢物的知识以及强大的机器学习方法。与现有的方法不同，这种新方法不假设参数峰形模型，并允许最大的灵活性。我们使用真实数据证明了新方法的优越性。由于与已知代谢物的匹配存在不确定性，不能将其视为黄金标准，因此我们还开发了一种可以包含不确定性的概率接收者操作特征 (pROC) 方法。

可用性和实现

新的峰检测方法作为可从 http://web1.sph.emory.edu/apLCMS/ 获取的 apLCMS 包的一部分实现

联系信息

tyu8@emory.edu

补充信息

补充数据可在生物信息学在线获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用已有知识和机器学习方法提高高分辨率 LC/MS 代谢组学数据中的峰检测。

Improving peak detection in high-resolution LC/MS metabolomics data using preexisting knowledge and machine learning approach.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系信息

补充信息

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

利用已有知识和机器学习方法提高高分辨率 LC/MS 代谢组学数据中的峰检测。

Improving peak detection in high-resolution LC/MS metabolomics data using preexisting knowledge and machine learning approach.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

联系信息

补充信息