Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, University of Leuven (KU Leuven), Herestraat 49, Leuven 3000, Belgium; Department of Chemical Engineering, Vrije Universiteit Brussel, Pleinlaan 2, Brussel 1050, Belgium.
Department for Pharmaceutical and Pharmacological Sciences, Pharmaceutical Analysis, University of Leuven (KU Leuven), Herestraat 49, Leuven 3000, Belgium; Department of Pharmaceutical Development and Manufacturing Sciences, Janssen Pharmaceutica, Turnhoutseweg 30, Beerse 2340, Belgium.
J Chromatogr A. 2022 Jun 7;1672:463005. doi: 10.1016/j.chroma.2022.463005. Epub 2022 Mar 31.
Although commercially available software provides options for automatic peak detection, visual inspection and manual corrections are often needed. Peak detection algorithms commonly employed require carefully written rules and thresholds to increase true positive rates and decrease false positive rates. In this study, a deep learning model, specifically, a convolutional neural network (CNN), was implemented to perform automatic peak detection in reversed-phase liquid chromatography (RPLC). The model inputs a whole chromatogram and outputs predicted locations, probabilities, and areas of the peaks. The obtained results on a simulated validation set demonstrated that the model performed well (ROC-AUC of 0.996), and comparably or better than a derivative-based approach using the Savitzky-Golay algorithm for detecting peaks on experimental chromatograms (8.6% increase in true positives). In addition, predicted peak probabilities (typically between 0.5 and 1.0 for true positives) gave an indication of how confident the CNN model was in the peaks detected. The CNN model was trained entirely on simulated chromatograms (a training set of 1,000,000 chromatograms), and thus no effort had to be put into collecting and labeling chromatograms. A potential major drawback of this approach, namely training a CNN model on simulated chromatograms, is the risk of not capturing the actual "chromatogram space" well enough that is needed to perform accurate peak detection in real chromatograms.
尽管市售软件提供了自动峰检测选项,但通常仍需要进行目视检查和手动校正。常用的峰检测算法需要精心编写规则和阈值,以提高真阳性率并降低假阳性率。在这项研究中,我们实施了一种深度学习模型,即卷积神经网络(CNN),以在反相液相色谱(RPLC)中进行自动峰检测。该模型输入整个色谱图,并输出预测的峰的位置、概率和面积。在模拟验证集上获得的结果表明,该模型表现良好(ROC-AUC 为 0.996),并且与使用 Savitzky-Golay 算法的基于导数的方法在实验色谱图上检测峰的性能相当或更好(真阳性增加了 8.6%)。此外,预测的峰概率(对于真阳性通常在 0.5 到 1.0 之间)表明 CNN 模型对检测到的峰的置信程度。该 CNN 模型完全在模拟色谱图上进行训练(训练集为 100 万张色谱图),因此无需花费精力收集和标记色谱图。这种方法的一个潜在主要缺点是,即在模拟色谱图上训练 CNN 模型,可能无法很好地捕获实际的“色谱图空间”,从而无法在实际色谱图中进行准确的峰检测。