CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 106023, China.
CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 106023, China.
Talanta. 2021 Jan 15;222:121580. doi: 10.1016/j.talanta.2020.121580. Epub 2020 Aug 28.
Feature detection is a crucial pre-processing step for high-resolution liquid chromatography-mass spectrometry (LC-MS) data analysis. Typical practices based on thresholds or rigid mathematical assumptions can cause ineffective performance in detecting low abundance and non-ideal distributed compounds. We herein introduce a novel feature detection method based on deep learning named SeA-M2Net that considers feature detection as an image-based object detection task. By fully employing raw data directly, and integrating all related factors (e.g., LC elution, charge state, and isotope distribution) with two-dimensional pseudo color images to calculate the probability of the presence of the compound, low abundance compounds can be well preserved and observed. More importantly, SeA-M2Net, with deep multilevel and multiscale structures focuses on compound pattern detection in a learned method instead of assuming a mathematical parametric model. All parameters in SeA-M2Net are learned from data in the training procedure, thus allowing for maximum flexibility of pattern distribution deformation. The algorithm is tested on several LC-MS datasets of multiple biological samples obtained from different instruments with varied experimental settings. We demonstrate the superiority of the new approach in handling complex compound patterns (e.g., low abundance, overlapping regions, LC shifts, and missing values). Our experiments indicate that SeA-M2Net outperforms widely used detection methods in terms of detection accuracy.
特征检测是高分辨率液相色谱-质谱(LC-MS)数据分析的关键预处理步骤。基于阈值或严格数学假设的典型方法可能会导致在检测低丰度和非理想分布化合物时性能不佳。我们在此引入了一种基于深度学习的新型特征检测方法,称为 SeA-M2Net,它将特征检测视为基于图像的目标检测任务。通过直接充分利用原始数据,并将所有相关因素(例如 LC 洗脱、电荷状态和同位素分布)与二维伪彩色图像集成,以计算化合物存在的概率,可以很好地保留和观察低丰度化合物。更重要的是,SeA-M2Net 具有深层次的多级和多尺度结构,专注于以学习的方法检测化合物模式,而不是假设数学参数模型。SeA-M2Net 中的所有参数都是在训练过程中从数据中学习得到的,因此允许模式分布变形的最大灵活性。该算法在来自不同仪器的多个具有不同实验设置的生物样本的多个 LC-MS 数据集上进行了测试。我们证明了该新方法在处理复杂化合物模式(例如低丰度、重叠区域、LC 位移和缺失值)方面的优越性。我们的实验表明,SeA-M2Net 在检测准确性方面优于广泛使用的检测方法。