Department of Analytical Chemistry, University of Vienna, A-1090 Vienna, Austria.
Department of Agrobiotechnology IFA-Tulln, Institute of Bioanalytics and Agro-Metabolomics, University of Natural Resources and Life Sciences, Vienna, A-3430 Tulln, Austria.
Bioinformatics. 2022 Jun 27;38(13):3422-3428. doi: 10.1093/bioinformatics/btac344.
Chromatographic peak picking is among the first steps in data processing workflows of raw LC-HRMS datasets in untargeted metabolomics applications. Its performance is crucial for the holistic detection of all metabolic features as well as their relative quantification for statistical analysis and metabolite identification. Random noise, non-baseline separated compounds and unspecific background signals complicate this task.
A machine-learning-based approach entitled PeakBot was developed for detecting chromatographic peaks in LC-HRMS profile-mode data. It first detects all local signal maxima in a chromatogram, which are then extracted as super-sampled standardized areas (retention-time versus m/z). These are subsequently inspected by a custom-trained convolutional neural network that forms the basis of PeakBot's architecture. The model reports if the respective local maximum is the apex of a chromatographic peak or not as well as its peak center and bounding box. In training and independent validation datasets used for development, PeakBot achieved a high performance with respect to discriminating between chromatographic peaks and background signals (accuracy of 0.99). For training the machine-learning model a minimum of 100 reference features are needed to learn their characteristics to achieve high-quality peak-picking results for detecting such chromatographic peaks in an untargeted fashion. PeakBot is implemented in python (3.8) and uses the TensorFlow (2.5.0) package for machine-learning related tasks. It has been tested on Linux and Windows OSs.
The package is available free of charge for non-commercial use (CC BY-NC-SA). It is available at https://github.com/christophuv/PeakBot.
Supplementary data are available at Bioinformatics online.
在非靶向代谢组学应用中,色谱峰提取是原始 LC-HRMS 数据数据处理工作流程的第一步。其性能对于整体检测所有代谢特征及其相对定量进行统计分析和代谢物鉴定至关重要。随机噪声、未基线分离的化合物和非特异性背景信号使这项任务变得复杂。
我们开发了一种基于机器学习的方法,名为 PeakBot,用于检测 LC-HRMS 谱图模式数据中的色谱峰。它首先检测色谱图中的所有局部信号极大值,然后将其提取为超采样标准化区域(保留时间与 m/z)。随后,由一个定制训练的卷积神经网络对其进行检查,该网络构成了 PeakBot 架构的基础。该模型报告各局部最大值是否为色谱峰的顶点以及其峰中心和边界框。在用于开发的训练和独立验证数据集中,PeakBot 在区分色谱峰和背景信号方面表现出很高的性能(准确率为 0.99)。为了训练机器学习模型,需要至少 100 个参考特征来学习它们的特征,以便以非靶向方式检测到这些色谱峰并获得高质量的峰提取结果。PeakBot 是用 python(3.8)编写的,并使用 TensorFlow(2.5.0)包进行与机器学习相关的任务。它已经在 Linux 和 Windows 操作系统上进行了测试。
该软件包可供非商业用途免费使用(CC BY-NC-SA)。可在 https://github.com/christophuv/PeakBot 上获取。
补充数据可在生物信息学在线获取。