Blake Nathan, Gaifulina Riana, Griffin Lewis D, Bell Ian M, Thomas Geraint M H
Department of Cell and Developmental Biology, University College London, London WC1E 6BT, UK.
Department of Computer Science, University College London, London WC1E 6BT, UK.
Diagnostics (Basel). 2022 Jun 17;12(6):1491. doi: 10.3390/diagnostics12061491.
Raman Spectroscopy has long been anticipated to augment clinical decision making, such as classifying oncological samples. Unfortunately, the complexity of Raman data has thus far inhibited their routine use in clinical settings. Traditional machine learning models have been used to help exploit this information, but recent advances in deep learning have the potential to improve the field. However, there are a number of potential pitfalls with both traditional and deep learning models. We conduct a literature review to ascertain the recent machine learning methods used to classify cancers using Raman spectral data. We find that while deep learning models are popular, and ostensibly outperform traditional learning models, there are many methodological considerations which may be leading to an over-estimation of performance; primarily, small sample sizes which compound sub-optimal choices regarding sampling and validation strategies. Amongst several recommendations is a call to collate large benchmark Raman datasets, similar to those that have helped transform digital pathology, which researchers can use to develop and refine deep learning models.
长期以来,人们一直期望拉曼光谱能够辅助临床决策,比如对肿瘤样本进行分类。遗憾的是,拉曼数据的复杂性至今仍阻碍了其在临床环境中的常规应用。传统机器学习模型已被用于帮助利用这些信息,但深度学习的最新进展有可能推动该领域的发展。然而,传统模型和深度学习模型都存在一些潜在的问题。我们进行了一项文献综述,以确定最近使用拉曼光谱数据对癌症进行分类的机器学习方法。我们发现,虽然深度学习模型很受欢迎,表面上也优于传统学习模型,但有许多方法上的考虑因素可能导致对性能的高估;主要是样本量小,这加剧了在采样和验证策略方面次优选择的影响。我们提出了几项建议,其中包括呼吁整理大型基准拉曼数据集,类似于那些有助于变革数字病理学的数据集,研究人员可以用这些数据集来开发和完善深度学习模型。