Kim Michelle Kang, Rouphael Carol, Wehbe Sarah, Yoon Ji Yoon, Wisnivesky Juan, McMichael John, Welch Nicole, Dasarathy Srinivasan, Zabor Emily C
Department of Gastroenterology, Hepatology, and Nutrition, Cleveland Clinic, Cleveland, Ohio.
Division of Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, New York.
Gastro Hep Adv. 2024 Jul 14;3(7):910-916. doi: 10.1016/j.gastha.2024.07.001. eCollection 2024.
Gastric cancer (GC) is a leading cause of cancer incidence and mortality globally. Population screening is limited by the low incidence and prevalence of GC in the United States. A risk prediction algorithm to identify high-risk patients allows for targeted GC screening. We aimed to determine the feasibility and performance of a logistic regression model based on electronic health records to identify individuals at high risk for noncardia gastric cancer (NCGC).
We included 614 patients who had a diagnosis of NCGC between ages 40 and 80 years and who were seen at our large tertiary medical center in multiple states between 2010 and 2021. Controls without a diagnosis of NCGC were randomly selected in a 1:10 ratio of cases to controls. Multiple imputation by chained equations for missing data followed by logistic regression on imputed datasets was used to estimate the probability of NCGC. Area under the curve and the 0.632 estimator was used as the estimate for discrimination.
The 0.632 estimator value was 0.731, indicating robust model performance. Probability of NCGC was higher with increasing age (odds ratio [OR] = 1.16, 95% confidence interval [CI]: 1.04-1.3), male sex (OR = 1.97; 95% CI: 1.64-2.36), Black (OR = 3.07; 95% CI: 2.46-3.83) or Asian race (OR = 4.39; 95% CI: 2.60-7.42), tobacco use (OR = 1.61; 95% CI: 1.34-1.94), anemia (OR = 1.35; 95% CI: 1.09-1.68), and pernicious anemia (OR = 6.12, 95% CI: 3.42-10.95).
We demonstrate the feasibility and good performance of an electronic health record-based logistic regression model for estimating the probability of NCGC. Future studies will refine and validate this model, ultimately identifying a high-risk cohort who could be eligible for NCGC screening.
胃癌(GC)是全球癌症发病率和死亡率的主要原因。在美国,由于胃癌发病率和患病率较低,人群筛查受到限制。一种用于识别高危患者的风险预测算法有助于进行有针对性的胃癌筛查。我们旨在确定基于电子健康记录的逻辑回归模型用于识别非贲门胃癌(NCGC)高危个体的可行性和性能。
我们纳入了614例年龄在40至80岁之间、于2010年至2021年期间在我们位于多个州的大型三级医疗中心就诊且被诊断为非贲门胃癌的患者。未被诊断为非贲门胃癌的对照以病例与对照1:10的比例随机选取。对缺失数据采用链式方程多重填补,然后对填补后的数据集进行逻辑回归,以估计非贲门胃癌的概率。曲线下面积和0.632估计量用作区分度的估计。
0.632估计量的值为0.731,表明模型性能稳健。随着年龄增长(比值比[OR]=1.16,95%置信区间[CI]:1.04 - 1.3)、男性(OR = 1.97;95% CI:1.64 - 2.36)、黑人(OR = 3.07;95% CI:2.46 - 3.83)或亚洲种族(OR = 4.39;95% CI:2.60 - 7.42)、吸烟(OR = 1.61;95% CI:1.34 - 1.94)、贫血(OR = 1.35;95% CI:1.09 - 1.68)和恶性贫血(OR = 6.12,95% CI:3.42 - 10.95),非贲门胃癌的概率更高。
我们证明了基于电子健康记录的逻辑回归模型用于估计非贲门胃癌概率的可行性和良好性能。未来的研究将完善并验证该模型,最终确定可能符合非贲门胃癌筛查条件的高危队列。