Centre de recherche du Centre hospitalier de l'Université de Montréal, Montreal, Quebec, Canada.
Institut du cancer de Montréal, Montreal, Quebec, Canada.
J Biomed Opt. 2021 Nov;26(11). doi: 10.1117/1.JBO.26.11.116501.
Prostate cancer is the most common cancer among men. An accurate diagnosis of its severity at detection plays a major role in improving their survival. Recently, machine learning models using biomarkers identified from Raman micro-spectroscopy discriminated intraductal carcinoma of the prostate (IDC-P) from cancer tissue with a ≥85 % detection accuracy and differentiated high-grade prostatic intraepithelial neoplasia (HGPIN) from IDC-P with a ≥97.8 % accuracy.
To improve the classification performance of machine learning models identifying different types of prostate cancer tissue using a new dimensional reduction technique.
A radial basis function (RBF) kernel support vector machine (SVM) model was trained on Raman spectra of prostate tissue from a 272-patient cohort (Centre hospitalier de l'Université de Montréal, CHUM) and tested on two independent cohorts of 76 patients [University Health Network (UHN)] and 135 patients (Centre hospitalier universitaire de Québec-Université Laval, CHUQc-UL). Two types of engineered features were used. Individual intensity features, i.e., Raman signal intensity measured at particular wavelengths and novel Raman spectra fitted peak features consisting of peak heights and widths.
Combining engineered features improved classification performance for the three aforementioned classification tasks. The improvements for IDC-P/cancer classification for the UHN and CHUQc-UL testing sets in accuracy, sensitivity, specificity, and area under the curve (AUC) are (numbers in parenthesis are associated with the CHUQc-UL testing set): +4 % (+8 % ), +7 % (+9 % ), +2 % (6%), +9 (+9) with respect to the current best models. Discrimination between HGPIN and IDC-P was also improved in both testing cohorts: +2.2 % (+1.7 % ), +4.5 % (+3.6 % ), +0 % (+0 % ), +2.3 (+0). While no global improvements were obtained for the normal versus cancer classification task [+0 % (-2 % ), +0 % (-3 % ), +2 % (-2 % ), +4 (+3)], the AUC was improved in both testing sets.
Combining individual intensity features and novel Raman fitted peak features, improved the classification performance on two independent and multicenter testing sets in comparison to using only individual intensity features.
前列腺癌是男性中最常见的癌症。在检测时对其严重程度进行准确诊断,对提高患者生存率起着重要作用。最近,使用拉曼微光谱仪识别出的生物标志物的机器学习模型,以≥85%的检测准确率区分前列腺导管内癌(IDC-P)与癌组织,并以≥97.8%的准确率区分高级前列腺上皮内瘤变(HGPIN)与 IDC-P。
使用新的降维技术提高识别不同类型前列腺癌组织的机器学习模型的分类性能。
使用来自 272 名患者队列(蒙特利尔大学医疗中心,CHUM)的前列腺组织拉曼光谱训练径向基函数(RBF)核支持向量机(SVM)模型,并在两个独立的 76 名患者队列(多伦多大学健康网络,UHN)和 135 名患者队列(魁北克大学医疗中心-拉瓦尔大学,CHUQc-UL)进行测试。使用了两种类型的工程特征。个体强度特征,即特定波长测量的拉曼信号强度,以及新的拉曼光谱拟合峰特征,包括峰高和峰宽。
组合工程特征可提高上述三种分类任务的分类性能。在 UHN 和 CHUQc-UL 测试集的 IDC-P/癌症分类中,准确性、敏感性、特异性和曲线下面积(AUC)的提高分别为(括号内的数字与 CHUQc-UL 测试集相关):+4%(+8%)、+7%(+9%)、+2%(6%)、+9(+9),与当前最佳模型相比。在两个测试队列中,HGPIN 与 IDC-P 之间的区分也有所提高:+2.2%(+1.7%)、+4.5%(+3.6%)、+0%(+0%)、+2.3(+0)。虽然在正常与癌症分类任务中没有获得全局提高(+0%(-2%)、+0%(-3%)、+2%(-2%)、+4(+3)),但 AUC 在两个测试集中都有所提高。
与仅使用个体强度特征相比,组合个体强度特征和新的拉曼拟合峰特征可提高两个独立的多中心测试集的分类性能。