Yang Bo, Chen Chen, Cheng Chen, Cheng Hong, Yan Ziwei, Chen Fangfang, Zhu Zhimin, Zhang Huiting, Yue Feilong, Lv Xiaoyi
College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China.
College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China.
Photodiagnosis Photodyn Ther. 2021 Mar;33:102199. doi: 10.1016/j.pdpdt.2021.102199. Epub 2021 Jan 27.
Breast cancer screening is time consuming, requires expensive equipment, and has demanding requirements for doctors. Hence, a large number of breast cancer patients may miss screening and early treatment, which greatly threatens their health around the world. Infrared spectroscopy may be able to be used as a screening tool for breast cancer detection. Fourier transform infrared (FT-IR) spectroscopy of serum was combined with traditional machine learning algorithms to achieve an auxiliary diagnosis that could quickly and accurately distinguish patients with different stages of breast cancer, including stage 1 disease, from control subjects without breast cancer.
FT-IR spectroscopy were performed on the serum of 114 non-cancer control subjects, 35 patients with stage I, 43 patients with stage II, and 29 patients with stage III & IV breast cancer. Due to the experimental sample imbalance, we used the oversampling to process the four classes of sample. The oversampling selected Synthetic Minority Oversampling Technique (SMOTE). Subsequently, we used the random discarding method in undersampling to do experiments as well. The average FT-IR spectroscopy results for the four groups showed differences in phospholipids, nucleic acids, lipids, and proteins between non-cancer control subjects and breast cancer patients at different stages. Based on these differences, four classification models were used to classify stage I, II, III & IV breast cancer patients and non-cancer control subjects. First, standard normal variate transformation (SNV) was used to preprocess the original data, and then partial least squares (PLS) was used for feature extraction. Finally, the five models were established including extreme learning machine (ELM), k-nearest neighbor (KNN), genetic algorithms based on support vector machine (GA-SVM), particle swarm optimization-support vector machine (PSO-SVM) and grid search-support vector machine (GS-SVM).
In oversampling experiment, the GS-SVM classifier obtained the highest average classification accuracy of 95.45 %; the diagnostic accuracy of non-cancer control subjects was 100 %; breast cancer stage I was 90 %; breast cancer stage II was 84.62 %; and breast cancer stage III & IV was 100 %. In undersampling experiment, the GA-SVM model obtained the highest average classification accuracy of 100 %; the diagnostic accuracy of non-cancer control subjects was 100 %; breast cancer stage I was 100 %; breast cancer stage II was 100 %; and breast cancer stage III & IV was 100 %. The results show that FT-IR spectroscopy combined with powerful classification algorithms has great potential in distinguishing patients with different stages of breast cancer from non-cancer control subjects. In addition, this research provides a reference for future multiclassification studies of cervical cancer, ovarian cancer and other female high-incidence cancers through serum FT-IR spectroscopy.
乳腺癌筛查耗时、需要昂贵设备且对医生要求较高。因此,全球大量乳腺癌患者可能错过筛查和早期治疗,这极大地威胁着他们的健康。红外光谱法或许能够用作乳腺癌检测的筛查工具。血清的傅里叶变换红外(FT-IR)光谱法与传统机器学习算法相结合,以实现辅助诊断,能够快速且准确地将包括I期疾病在内的不同阶段乳腺癌患者与无乳腺癌的对照受试者区分开来。
对114名非癌症对照受试者、35名I期患者、43名II期患者以及29名III期和IV期乳腺癌患者的血清进行FT-IR光谱分析。由于实验样本不均衡,我们使用过采样来处理这四类样本。过采样选用合成少数类过采样技术(SMOTE)。随后,我们也使用欠采样中的随机丢弃法进行实验。四组的平均FT-IR光谱结果显示,非癌症对照受试者与不同阶段乳腺癌患者在磷脂、核酸、脂质和蛋白质方面存在差异。基于这些差异,使用四种分类模型对I期、II期、III期和IV期乳腺癌患者以及非癌症对照受试者进行分类。首先,使用标准正态变量变换(SNV)对原始数据进行预处理,然后使用偏最小二乘法(PLS)进行特征提取。最后,建立了包括极限学习机(ELM)、k近邻(KNN)、基于遗传算法的支持向量机(GA-SVM)、粒子群优化支持向量机(PSO-SVM)和网格搜索支持向量机(GS-SVM)在内的五个模型。
在过采样实验中,GS-SVM分类器获得的平均分类准确率最高,为95.45%;非癌症对照受试者的诊断准确率为100%;I期乳腺癌为90%;II期乳腺癌为84.62%;III期和IV期乳腺癌为100%。在欠采样实验中,GA-SVM模型获得的平均分类准确率最高,为100%;非癌症对照受试者的诊断准确率为100%;I期乳腺癌为100%;II期乳腺癌为100%;III期和IV期乳腺癌为100%。结果表明,FT-IR光谱法与强大的分类算法相结合在区分不同阶段乳腺癌患者与非癌症对照受试者方面具有巨大潜力。此外,本研究为未来通过血清FT-IR光谱法对宫颈癌、卵巢癌等女性高发癌症进行多分类研究提供了参考。