Zaki Jihan K, Tomasik Jakub, McCune Jade A, Bahn Sabine, Lió Pietro, Scherman Oren A
Melville Laboratory for Polymer Synthesis, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Rd, Cambridge CB2 1EW, U.K.
Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, U.K.
ACS Sens. 2025 Sep 26;10(9):6597-6606. doi: 10.1021/acssensors.5c01058. Epub 2025 Sep 2.
Surface-enhanced Raman spectroscopy (SERS) is rapidly gaining attention as a fast and inexpensive method of biomarker quantification, which can be combined with deep learning to elucidate complex biomarker-disease relationships. Current standard practices in SERS analysis are behind the state-of-the-art machine learning approaches; however, the present challenges of SERS analysis could be effectively addressed with a robust computational framework. Furthermore, there is a need for improved model explainability for SERS analysis, which at present is insufficient in assessing the contexts in which confounding factors affect prediction outcomes. This study presents a framework for SERS bioquantification rooted in a three-step process, including spectral processing, quantification, and explainability. A serotonin quantification task in urine was assessed as a model task, with 682 SERS spectra measured in a micromolar range using cucurbit[8]uril chemical spacers. A denoising autoencoder was utilized for spectral enhancement, while convolutional neural networks (CNNs) and vision transformers were utilized for biomarker quantification. In addition, a context representative interpretable model explanation (CRIME) method was developed to suit the current needs of SERS mixture analysis explainability. Serotonin quantification was most efficient in denoised spectra analyzed using a CNN with a three-parameter logistic output layer (mean absolute error = 0.15 μM, mean percentage error = 4.67%). Subsequently, the CRIME method revealed the CNN model to present six unique prediction contexts, of which three were associated with serotonin. The proposed framework could unlock a novel, untargeted hypothesis-generating method of biomarker discovery, considering the rapid and inexpensive nature of SERS measurements and the potential to identify biomarkers from CRIME contexts.
表面增强拉曼光谱(SERS)作为一种快速且廉价的生物标志物定量方法正迅速受到关注,它可与深度学习相结合以阐明复杂的生物标志物与疾病的关系。SERS分析中的当前标准做法落后于最先进的机器学习方法;然而,SERS分析当前面临的挑战可以通过一个强大的计算框架有效解决。此外,需要提高SERS分析的模型可解释性,目前在评估混杂因素影响预测结果的背景方面还不够充分。本研究提出了一个基于三步过程的SERS生物定量框架,包括光谱处理、定量和可解释性。将尿液中的血清素定量任务作为一个模型任务进行评估,使用葫芦[8]脲化学间隔物在微摩尔范围内测量了682个SERS光谱。利用去噪自动编码器进行光谱增强,同时利用卷积神经网络(CNN)和视觉Transformer进行生物标志物定量。此外,还开发了一种上下文代表性可解释模型解释(CRIME)方法以满足SERS混合物分析可解释性的当前需求。在使用具有三参数逻辑输出层的CNN分析的去噪光谱中,血清素定量最为有效(平均绝对误差 = 0.15 μM,平均百分比误差 = 4.67%)。随后,CRIME方法揭示CNN模型呈现出六个独特的预测上下文,其中三个与血清素相关。考虑到SERS测量的快速性和廉价性以及从CRIME上下文中识别生物标志物的潜力,所提出的框架可以开启一种新颖的、非靶向的生物标志物发现假设生成方法。