Yu Shuxian, Jiang Haiyang, Xia Jing, Gu Jie, Chen Mengting, Wang Yan, Zhao Xiaohong, Liao Zehua, Zeng Puhua, Xie Tian, Sui Xinbing
School of Pharmacy, Hangzhou Normal University, Hangzhou, China.
The First Affiliated Hospital of Zhejiang Chinese Medicine University, Hangzhou, China.
Chin Med. 2025 Jan 7;20(1):7. doi: 10.1186/s13020-025-01059-4.
The individualized prediction and discrimination of precancerous lesions of gastric cancer (PLGC) is critical for the early prevention of gastric cancer (GC). However, accurate non-invasive methods for distinguishing between PLGC and GC are currently lacking. This study therefore aimed to develop a risk prediction model by machine learning and deep learning techniques to aid the early diagnosis of GC.
In this study, a total of 2229 subjects were recruited from nine tertiary hospitals between October 2022 and November 2023. We designed a comprehensive questionnaire, identified statistically significant factors, and created a web-based column chart. Then, a risk prediction model was subsequently developed by machine learning techniques. In addition, a tongue image-based risk prediction model was established by deep learning algorithms.
Based on logistic regression analysis, a dynamic web-based nomogram was developed and it is freely accessible at: https://yz6677.shinyapps.io/GC67/ . Then, the prediction model was established using ten different machine learning algorithms and the Random Forest (RF) model achieved the highest accuracy at 85.65%. According with the predictive results, the top 10 key risk factors were age, traditional Chinese medicine (TCM) constitution type, tongue coating color, tongue color, irregular meals, pickled food, greasy fur, over-hot eating habit, anxiety and sleep onset latency. These factors are all significant risk indicators for the progression of PLGC patients to GC patients. Subsequently, the Swin Transformer architecture was used to develop a tongue image-based model for predicting the risk for progression of PLGC. The verification set showed an accuracy of 73.33% and an area under curve (AUC) greater than 0.8 across all models.
Our study developed machine learning and deep learning-based models for predicting the risk for progression of PLGC to GC, which will offer the assistance to determine the high-risk patients from PLGC and improve the early diagnosis of GC.
胃癌癌前病变(PLGC)的个体化预测和鉴别对于胃癌(GC)的早期预防至关重要。然而,目前缺乏准确区分PLGC和GC的非侵入性方法。因此,本研究旨在通过机器学习和深度学习技术开发一种风险预测模型,以辅助GC的早期诊断。
本研究于2022年10月至2023年11月从9家三级医院招募了2229名受试者。我们设计了一份综合问卷,确定了具有统计学意义的因素,并创建了一个基于网络的柱状图。然后,通过机器学习技术开发了一种风险预测模型。此外,通过深度学习算法建立了基于舌象的风险预测模型。
基于逻辑回归分析,开发了一个基于网络的动态列线图,可通过以下链接免费访问:https://yz6677.shinyapps.io/GC67/ 。然后,使用十种不同的机器学习算法建立了预测模型,随机森林(RF)模型的准确率最高,为85.65%。根据预测结果,前10个关键风险因素是年龄、中医体质类型、舌苔颜色、舌质颜色、饮食不规律、腌制食品、腻苔、过烫饮食习惯、焦虑和入睡潜伏期。这些因素都是PLGC患者进展为GC患者的重要风险指标。随后,使用Swin Transformer架构开发了一种基于舌象的模型,用于预测PLGC进展的风险。验证集显示,所有模型的准确率为73.33%,曲线下面积(AUC)大于0.8。
我们的研究开发了基于机器学习和深度学习的模型,用于预测PLGC进展为GC的风险,这将有助于从PLGC中确定高危患者,并改善GC的早期诊断。