Institute for Sociology and Demography, University of Rostock, Rostock, Germany.
German Center for Neurodegenerative Diseases, Bonn, Germany.
Alzheimers Dement. 2023 Feb;19(2):477-486. doi: 10.1002/alz.12663. Epub 2022 Apr 22.
We examined whether German claims data are suitable for dementia risk prediction, how machine learning (ML) compares to classical regression, and what the important predictors for dementia risk are.
We analyzed data from the largest German health insurance company, including 117,895 dementia-free people age 65+. Follow-up was 10 years. Predictors were: 23 age-related diseases, 212 medical prescriptions, 87 surgery codes, as well as age and sex. Statistical methods included logistic regression (LR), gradient boosting (GBM), and random forests (RFs).
Discriminatory power was moderate for LR (C-statistic = 0.714; 95% confidence interval [CI] = 0.708-0.720) and GBM (C-statistic = 0.707; 95% CI = 0.700-0.713) and lower for RF (C-statistic = 0.636; 95% CI = 0.628-0.643). GBM had the best model calibration. We identified antipsychotic medications and cerebrovascular disease but also a less-established specific antibacterial medical prescription as important predictors.
Our models from German claims data have acceptable accuracy and may provide cost-effective decision support for early dementia screening.
我们研究了德国索赔数据是否适合进行痴呆风险预测,机器学习(ML)与经典回归相比如何,以及痴呆风险的重要预测因素是什么。
我们分析了德国最大的健康保险公司的数据,其中包括 117895 名无痴呆的 65 岁以上人群。随访时间为 10 年。预测因子包括 23 种与年龄相关的疾病、212 种药物处方、87 种手术代码以及年龄和性别。统计方法包括逻辑回归(LR)、梯度提升(GBM)和随机森林(RFs)。
LR(C 统计量= 0.714;95%置信区间 [CI] = 0.708-0.720)和 GBM(C 统计量= 0.707;95% CI = 0.700-0.713)的判别能力中等,而 RF(C 统计量= 0.636;95% CI = 0.628-0.643)的判别能力较低。GBM 的模型校准效果最佳。我们确定了抗精神病药物和脑血管疾病,但也确定了一种不太确定的特定抗菌药物处方是重要的预测因素。
我们的德国索赔数据模型具有可接受的准确性,可能为早期痴呆筛查提供具有成本效益的决策支持。