Thapa Susan, Fischbach Lori A, Delongchamp Robert, Faramawi Mohammed F, Orloff Mohammed S
Department of Epidemiology, College of Public Health, University of Arkansas for Medical Sciences, Little Rock 72205, USA.
Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock 72205, USA.
Gastroenterol Res Pract. 2019 Apr 1;2019:8321942. doi: 10.1155/2019/8321942. eCollection 2019.
Gastric cancer is the fourth most common cancer and the third most common cause of cancer deaths worldwide. Morbidity and mortality from gastric cancer may be decreased by identification of those that are at high risk for progression in the gastric precancerous process so that they can be monitored over time for early detection and implementation of preventive strategies.
Using machine learning, we developed prediction models for gastric precancerous progression in a population from a developing country with a high rate of gastric cancer who underwent gastroscopies for dyspeptic symptoms. In the data imputed for completeness, we divided the data into a training and a validation test set. Using the training set, we used the random forest method to rank potential predictors based on their predictive importance. Using predictors identified by the random forest method, we conducted best subset linear regressions with the leave-one-out cross-validation approach to select predictors for overall progression and progression to dysplasia or cancer. We validated the models in the test set using leave-one-out cross-validation.
We observed for all models that complete intestinal metaplasia and incomplete intestinal metaplasia were the strongest predictors for further progression in the precancerous process. We also observed that a diagnosis of no gastritis, superficial gastritis, or antral diffuse gastritis at baseline was a predictor of no progression in the gastric precancerous process. The sensitivities and specificities were 86% and 79% for the general model and 100% and 82% for the location-specific model, respectively.
We developed prediction models to identify gastroscopy patients that are more likely to progress in the gastric precancerous process, among whom routine follow-up gastroscopies can be targeted to prevent gastric cancer. Future external validation is needed.
胃癌是全球第四大常见癌症,也是癌症死亡的第三大常见原因。通过识别胃癌癌前过程中进展风险较高的人群,以便对他们进行长期监测,从而实现早期发现和实施预防策略,可能会降低胃癌的发病率和死亡率。
我们利用机器学习,为来自一个胃癌高发的发展中国家、因消化不良症状接受胃镜检查的人群开发了胃癌癌前进展预测模型。在为完整性而插补的数据中,我们将数据分为训练集和验证测试集。利用训练集,我们使用随机森林方法根据潜在预测因子的预测重要性对其进行排名。利用随机森林方法识别出的预测因子,我们采用留一法交叉验证方法进行最佳子集线性回归,以选择总体进展以及进展为发育异常或癌症的预测因子。我们使用留一法交叉验证在测试集中对模型进行验证。
我们在所有模型中均观察到,完全肠化生和不完全肠化生是癌前过程中进一步进展的最强预测因子。我们还观察到,基线时诊断为无胃炎、浅表性胃炎或胃窦弥漫性胃炎是胃癌癌前过程无进展的一个预测因子。一般模型的敏感性和特异性分别为86%和79%,部位特异性模型的敏感性和特异性分别为100%和82%。
我们开发了预测模型,以识别在胃癌癌前过程中更有可能进展的胃镜检查患者,其中可针对这些患者进行常规随访胃镜检查以预防胃癌。未来需要进行外部验证。