Institute of Physical Chemistry (IPC) and Abbe Center of Photonics (ACP), Friedrich Schiller University Jena, Member of the Leibniz Centre for Photonics in Infection Research (LPI), Helmholtzweg 4, 07743 Jena, Germany.
Leibniz Institute of Photonic Technology, Member of Leibniz Health Technologies, Member of the Leibniz, Centre for Photonics in Infection Research (LPI), Albert Einstein Straße 9, 07745 Jena, Germany.
Molecules. 2024 Feb 28;29(5):1061. doi: 10.3390/molecules29051061.
Identifying bacterial strains is essential in microbiology for various practical applications, such as disease diagnosis and quality monitoring of food and water. Classical machine learning algorithms have been utilized to identify bacteria based on their Raman spectra. However, convolutional neural networks (CNNs) offer higher classification accuracy, but they require extensive training sets and retraining of previous untrained class targets can be costly and time-consuming. Siamese networks have emerged as a promising solution. They are composed of two CNNs with the same structure and a final network that acts as a distance metric, converting the classification problem into a similarity problem. Classical machine learning approaches, shallow and deep CNNs, and two Siamese network variants were tailored and tested on Raman spectral datasets of bacteria. The methods were evaluated based on mean sensitivity, training time, prediction time, and the number of parameters. In this comparison, Siamese-model2 achieved the highest mean sensitivity of 83.61 ± 4.73 and demonstrated remarkable performance in handling unbalanced and limited data scenarios, achieving a prediction accuracy of 73%. Therefore, the choice of model depends on the specific trade-off between accuracy, (prediction/training) time, and resources for the particular application. Classical machine learning models and shallow CNN models may be more suitable if time and computational resources are a concern. Siamese networks are a good choice for small datasets and CNN for extensive data.
在微生物学中,鉴定细菌菌株对于各种实际应用至关重要,例如疾病诊断和食品与水的质量监测。经典的机器学习算法已被用于根据细菌的拉曼光谱来识别细菌。然而,卷积神经网络(CNN)提供了更高的分类准确性,但它们需要广泛的训练集,并且对以前未训练的类别目标进行重新训练可能既昂贵又耗时。孪生网络已成为一种很有前途的解决方案。它们由两个具有相同结构的 CNN 和一个最终的网络组成,该网络充当距离度量标准,将分类问题转化为相似性问题。经典的机器学习方法、浅层和深层 CNN 以及两种孪生网络变体都经过了定制和测试,以适应细菌的拉曼光谱数据集。这些方法基于平均灵敏度、训练时间、预测时间和参数数量进行了评估。在这种比较中,孪生网络模型 2 实现了最高的平均灵敏度 83.61±4.73,并在处理不平衡和有限数据场景方面表现出色,达到了 73%的预测精度。因此,模型的选择取决于特定应用程序中准确性、(预测/训练)时间和资源之间的具体权衡。如果时间和计算资源是一个关注点,那么经典的机器学习模型和浅层 CNN 模型可能更合适。对于小数据集,孪生网络是一个不错的选择,而对于广泛的数据,CNN 是更好的选择。