Dixon Katelyn, Bonon Raissa, Ivander Felix, Ale Ebrahim Saba, Namdar Khashayar, Shayegannia Moein, Khalvati Farzad, Kherani Nazir P, Zavodni Anna, Matsuura Naomi
Department of Electrical and Computer Engineering, University of Toronto, Toronto M5S 1A4, Canada.
Institute of Biomedical Engineering, University of Toronto, Toronto M5S 3E2, Canada.
ACS Appl Nano Mater. 2023 Aug 22;6(17):15385-15396. doi: 10.1021/acsanm.3c01442. eCollection 2023 Sep 8.
Characterizing complex biofluids using surface-enhanced Raman spectroscopy (SERS) coupled with machine learning (ML) has been proposed as a powerful tool for point-of-care detection of clinical disease. ML is well-suited to categorizing otherwise uninterpretable, patient-derived SERS spectra that contain a multitude of low concentration, disease-specific molecular biomarkers among a dense spectral background of biological molecules. However, ML can generate false, non-generalizable models when data sets used for model training are inadequate. It is thus critical to determine how different SERS experimental methodologies and workflow parameters can potentially impact ML disease classification of clinical samples. In this study, a label-free, broadband, Ag nanoparticle-based SERS platform was coupled with ML to assess simulated clinical samples for cardiovascular disease (CVD), containing randomized combinations of five key CVD biomarkers at clinically relevant concentrations in serum. Raman spectra obtained at 532, 633, and 785 nm from up to 300 unique samples were classified into physiological and pathological categories using two standard ML models. Label-free SERS and ML could correctly classify randomized CVD samples with high accuracies of up to 90.0% at 532 nm using as few as 200 training samples. Spectra obtained at 532 nm produced the highest accuracies with no significant increase achieved using multiwavelength SERS. Sample preparation and measurement methodologies (e.g., different SERS substrate lots, sample volumes, sample sizes, and known variations in randomization and experimental handling) were shown to strongly influence the ML classification and could artificially increase classification accuracies by as much as 27%. This detailed investigation into the proper application of ML techniques for CVD classification can lead to improved data set acquisition required for the SERS community, such that ML on labeled and robust SERS data sets can be practically applied for future point-of-care testing in patients.
利用表面增强拉曼光谱(SERS)结合机器学习(ML)来表征复杂生物流体,已被提议作为一种用于临床疾病即时检测的强大工具。ML非常适合对原本无法解释的、源自患者的SERS光谱进行分类,这些光谱在生物分子的密集光谱背景中包含大量低浓度、疾病特异性的分子生物标志物。然而,当用于模型训练的数据集不足时,ML可能会生成错误的、不可推广的模型。因此,确定不同的SERS实验方法和工作流程参数如何可能影响临床样本的ML疾病分类至关重要。在本研究中,一个基于银纳米颗粒的无标记宽带SERS平台与ML相结合,以评估模拟的心血管疾病(CVD)临床样本,这些样本包含血清中具有临床相关浓度的五种关键CVD生物标志物的随机组合。使用两个标准ML模型,将从多达300个独特样本在532、633和785 nm处获得的拉曼光谱分类为生理和病理类别。无标记SERS和ML能够以高达90.0%的高精度正确分类随机的CVD样本,在532 nm处使用少至200个训练样本即可。在532 nm处获得的光谱产生了最高的准确率,使用多波长SERS并未实现显著提高。样本制备和测量方法(例如,不同的SERS底物批次、样本体积、样本大小以及随机化和实验操作中的已知变化)被证明会强烈影响ML分类,并且可能会人为地将分类准确率提高多达27%。对ML技术在CVD分类中的正确应用进行的这项详细研究,可导致SERS社区所需的数据集获取得到改善,从而使基于标记且可靠的SERS数据集的ML能够实际应用于未来患者的即时检测。