Zeynalova Samira, Worringen Peter, Bassler Stefan, Martin Anja, Czech Katrin, Greulich Lars, Reusche Matthias, Enders Ute, Reyes Nigar, Yahiaoui-Doktor Maryam, Collier Matthias, Loeffler Markus, Stegmann Tina
Institute for Medical Informatics, Statistics and Epidemiology (IMISE), Leipzig University, 04107, Leipzig, Germany.
Leipzig Research Centre for Civilization Diseases (LIFE), 04103, Leipzig, Germany.
Arch Public Health. 2025 May 7;83(1):124. doi: 10.1186/s13690-025-01606-3.
Self-reporting is a common approach in observational epidemiological studies. However, information can be biased by several causes and can, therefore, affect the outcomes of the investigations. This analysis aimed to evaluate the agreement between self-reported data from a population-based cohort study with data from two large German health insurance companies.
Participants with available self-reported diagnoses of a history of stroke, atrial fibrillation (AF), heart failure (HF), and myocardial infarction (MI) from the baseline and the follow-up (after six years) surveys of the prospective population-based LIFE-Adult study were included in this study. Two health insurance companies provided ICD-10-GM codes. The agreement between the self-reports and health insurance data (HID) was examined by calculating sensitivity, specificity, Cohen`s Kappa, positive and negative predictive values. We used multivariable logistic regression models to examine whether odds ratios (OR) for the association between risk factors and the certain disease changed, depending on whether self-reports or HID was used as the dependent variable.
One thousand seven hundred eighty four individuals with complete data were included in this interim analysis. Mean age was 58 (SD±12) years and 984 (55%) were female. 52 (2.9%) subjects reported a history of stroke, 99 (5.6%) AF, 63 (3.5%) HF, and 46 (2.6%) MI. Compared with the HID, a high specificity was found for all four diagnoses (stroke: 99% [95% CI 99.3-99.9]; AF: 99% [95% CI 98.1-99.2], HF: 98% [95% CI 97.6-98.9], and MI: 99% [95% CI 98.9-99.7]). Sensitivity ranged from 58% (95% CI 47.4-69.5) for stroke over 61% (95% CI 48.8-74.0) for MI, to 65% (95% CI 56.6-73.9) for AF. Sensitivity in HF was the lowest (20% [95% CI 14.4-26.5]).
The use of German health insurance data is a feasible method for verifying population-based self-reported diagnoses. The sensitivity varied among the self-reported diseases compared with the health insurance data, whereas the specificity was continuously high. The verification of self-reported diagnoses using health insurance data as an additional data source may be considered in future population-based assessments to reduce misclassification error of self-reported data.
自我报告是观察性流行病学研究中的一种常见方法。然而,信息可能因多种原因产生偏差,从而影响调查结果。本分析旨在评估一项基于人群的队列研究中的自我报告数据与两家大型德国健康保险公司的数据之间的一致性。
本研究纳入了前瞻性基于人群的LIFE-成人研究基线和随访(六年后)调查中,有自我报告的中风、心房颤动(AF)、心力衰竭(HF)和心肌梗死(MI)病史诊断的参与者。两家健康保险公司提供了ICD-10-GM编码。通过计算敏感性、特异性、科恩kappa系数、阳性和阴性预测值,来检验自我报告与健康保险数据(HID)之间的一致性。我们使用多变量逻辑回归模型来检验,根据将自我报告还是HID用作因变量,风险因素与特定疾病之间关联的优势比(OR)是否会发生变化。
本次中期分析纳入了1784名拥有完整数据的个体。平均年龄为58(标准差±12)岁,其中984名(55%)为女性。52名(2.9%)受试者报告有中风病史,99名(5.6%)有AF,63名(3.5%)有HF,46名(2.6%)有MI。与HID相比,所有四种诊断的特异性都很高(中风:99%[95%置信区间99.3 - 99.9];AF:99%[95%置信区间98.1 - 99.2],HF:98%[95%置信区间9