Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, UK.
Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK.
Eur J Epidemiol. 2019 Jun;34(6):557-565. doi: 10.1007/s10654-019-00499-1. Epub 2019 Feb 26.
Prospective, population-based studies that recruit participants in mid-life are valuable resources for dementia research. Follow-up in these studies is often through linkage to routinely-collected healthcare datasets. We investigated the accuracy of these datasets for dementia case ascertainment in a validation study using data from UK Biobank-an open access, population-based study of > 500,000 adults aged 40-69 years at recruitment in 2006-2010. From 17,198 UK Biobank participants recruited in Edinburgh, we identified those with ≥ 1 dementia code in their linked primary care, hospital admissions or mortality data and compared their coded diagnoses to clinical expert adjudication of their full-text medical record. We calculated the positive predictive value (PPV, the proportion of cases identified that were true positives) for all-cause dementia, Alzheimer's disease and vascular dementia for each dataset alone and in combination, and explored algorithmic code combinations to improve PPV. Among 120 participants, PPVs for all-cause dementia were 86.8%, 87.3% and 80.0% for primary care, hospital admissions and mortality data respectively and 82.5% across all datasets. We identified three algorithms that balanced a high PPV with reasonable case ascertainment. For Alzheimer's disease, PPVs were 74.1% for primary care, 68.2% for hospital admissions, 50.0% for mortality data and 71.4% in combination. PPV for vascular dementia was 43.8% across all sources. UK routinely-collected healthcare data can be used to identify all-cause dementia in prospective studies. PPVs for Alzheimer's disease and vascular dementia are lower. Further research is required to explore the geographic generalisability of these findings.
前瞻性、基于人群的研究招募中年参与者,是痴呆症研究的宝贵资源。这些研究的随访通常通过与常规收集的医疗保健数据集进行链接来实现。我们使用来自 UK Biobank 的数据在验证研究中调查了这些数据集在痴呆症病例确定中的准确性。UK Biobank 是一项针对 50 多万名年龄在 40-69 岁的成年人的开放性、基于人群的研究,于 2006-2010 年招募。从在爱丁堡招募的 17198 名 UK Biobank 参与者中,我们确定了那些在其链接的初级保健、住院或死亡数据中至少有 1 个痴呆症代码的人,并将他们的编码诊断与对其完整病历的临床专家判断进行了比较。我们单独计算了每个数据集以及组合数据集的全因痴呆、阿尔茨海默病和血管性痴呆的阳性预测值(PPV,确定的病例中真实阳性的比例),并探索了算法代码组合以提高 PPV。在 120 名参与者中,全因痴呆的 PPV 分别为初级保健、住院和死亡数据的 86.8%、87.3%和 80.0%,所有数据集的 PPV 为 82.5%。我们确定了三种算法,可以在高 PPV 和合理的病例确定之间取得平衡。对于阿尔茨海默病,初级保健的 PPV 为 74.1%,住院的 PPV 为 68.2%,死亡的 PPV 为 50.0%,组合的 PPV 为 71.4%。所有来源的血管性痴呆的 PPV 为 43.8%。英国常规收集的医疗保健数据可用于识别前瞻性研究中的全因痴呆。阿尔茨海默病和血管性痴呆的 PPV 较低。需要进一步研究来探索这些发现的地理普遍性。