Lange Christoph, Borisyak Maxim, Kögler Martin, Born Stefan, Ziehe Andreas, Neubauer Peter, Bournazou M Nicolas Cruz
Technische Universität Berlin, Faculty III Process Sciences, Institute of Biotechnology, Chair of Bioprocess Engineering, Straße des 17. Juni 135, Berlin, 10623, Berlin, Germany.
Technische Universität Berlin, Faculty III Process Sciences, Institute of Biotechnology, Chair of Bioprocess Engineering, Straße des 17. Juni 135, Berlin, 10623, Berlin, Germany.
Spectrochim Acta A Mol Biomol Spectrosc. 2025 Jun 5;334:125861. doi: 10.1016/j.saa.2025.125861. Epub 2025 Feb 11.
In biotechnology, Raman Spectroscopy is becoming increasingly popular as a process analytical technology (PAT) for measuring substrates, metabolites, and product-related concentrations. By recording the vibrational modes of molecular bonds, it provides information non-invasively in a high-dimensional spectrum. Machine learning models are used to transform these spectral data into meaningful concentrations of species. Typically, one assumes a linear relationship between intensity and concentrations and learns these relationships using a partial least squares (PLS) model. However, in biological cultivations with a very large number of components, nonlinear models such as convolutional neural networks (CNN) offer significant advantages. In this work, we show that training one CNN on spectra from eight different spectrometers significantly outperforms PLS models. Specifically, we created samples with known concentrations of glucose, sodium acetate and magnesium sulfate and measured more than 2200 spectra of these samples with eight different spectrometers. We trained one CNN on the spectra from all eight datasets simultaneously. This shows great potential for laboratories with data from more than one spectrometer as they do not need to spend extra effort in calibrating individual PLS models, but they can use a joint CNN, which even improves the overall accuracy. In addition, we compare the eight different spectrometers against each other. The results suggest that three spectrometers are better suited for quantifying glucose, sodium acetate, and magnesium sulfate given the models.
在生物技术领域,拉曼光谱作为一种过程分析技术(PAT),在测量底物、代谢物和与产品相关的浓度方面正变得越来越受欢迎。通过记录分子键的振动模式,它能在高维光谱中以非侵入性方式提供信息。机器学习模型用于将这些光谱数据转化为有意义的物质浓度。通常,人们假定强度与浓度之间存在线性关系,并使用偏最小二乘法(PLS)模型来学习这些关系。然而,在含有大量成分的生物培养中,诸如卷积神经网络(CNN)这样的非线性模型具有显著优势。在这项工作中,我们表明在来自八种不同光谱仪的光谱上训练一个卷积神经网络的效果显著优于偏最小二乘法模型。具体而言,我们创建了已知葡萄糖、醋酸钠和硫酸镁浓度的样本,并用八种不同光谱仪测量了这些样本的2200多个光谱。我们在来自所有八个数据集的光谱上同时训练一个卷积神经网络。这对于拥有来自不止一台光谱仪数据的实验室显示出巨大潜力,因为他们无需在单独校准偏最小二乘法模型上花费额外精力,而是可以使用联合卷积神经网络,这甚至还提高了整体准确性。此外,我们还对这八种不同光谱仪进行了相互比较。结果表明,鉴于这些模型,有三种光谱仪更适合用于定量分析葡萄糖、醋酸钠和硫酸镁。