Li Mengya, He Haiyan, Huang Guorong, Lin Bo, Tian Huiyan, Xia Ke, Yuan Changjing, Zhan Xinyu, Zhang Yang, Fu Weiling
Department of Laboratory Medicine, First Affiliated Hospital, Third Military Medical University (Army Medical University), Chongqing, China.
Department of Laboratory Medicine, Chongqing University Cancer Hospital, Chongqing, China.
Front Oncol. 2021 Sep 27;11:665176. doi: 10.3389/fonc.2021.665176. eCollection 2021.
Gastric cancer (GC) is the fifth most common cancer in the world and a serious threat to human health. Due to its high morbidity and mortality, a simple, rapid and accurate early screening method for GC is urgently needed. In this study, the potential of Raman spectroscopy combined with different machine learning methods was explored to distinguish serum samples from GC patients and healthy controls. Serum Raman spectra were collected from 109 patients with GC (including 35 in stage I, 14 in stage II, 35 in stage III, and 25 in stage IV) and 104 healthy volunteers matched for age, presenting for a routine physical examination. We analyzed the difference in serum metabolism between GC patients and healthy people through a comparative study of the average Raman spectra of the two groups. Four machine learning methods, one-dimensional convolutional neural network, random forest, support vector machine, and K-nearest neighbor were used to explore identifying two sets of Raman spectral data. The classification model was established by using 70% of the data as a training set and 30% as a test set. Using unseen data to test the model, the RF model yielded an accuracy of 92.8%, and the sensitivity and specificity were 94.7% and 90.8%. The performance of the RF model was further confirmed by the receiver operating characteristic (ROC) curve, with an area under the curve (AUC) of 0.9199. This exploratory work shows that serum Raman spectroscopy combined with RF has great potential in the machine-assisted classification of GC, and is expected to provide a non-destructive and convenient technology for the screening of GC patients.
胃癌(GC)是全球第五大常见癌症,对人类健康构成严重威胁。由于其高发病率和高死亡率,迫切需要一种简单、快速且准确的胃癌早期筛查方法。在本研究中,探索了拉曼光谱结合不同机器学习方法区分胃癌患者血清样本和健康对照的潜力。收集了109例胃癌患者(包括I期35例、II期14例、III期35例和IV期25例)以及104名年龄匹配的健康志愿者的血清拉曼光谱,这些志愿者正在进行常规体检。通过对两组平均拉曼光谱的比较研究,分析了胃癌患者与健康人血清代谢的差异。使用四种机器学习方法,即一维卷积神经网络、随机森林、支持向量机和K近邻,来探索识别两组拉曼光谱数据。以70%的数据作为训练集,30%作为测试集建立分类模型。使用未见过的数据对模型进行测试,随机森林(RF)模型的准确率为92.8%,灵敏度和特异度分别为94.7%和90.8%。通过受试者工作特征(ROC)曲线进一步证实了RF模型的性能,曲线下面积(AUC)为0.9199。这项探索性工作表明,血清拉曼光谱结合随机森林在胃癌的机器辅助分类方面具有巨大潜力,有望为胃癌患者的筛查提供一种无损且便捷的技术。