Suppr超能文献

大数据中诊断准确性的重要性:7000万德国人的健康保险理赔数据中2型糖尿病的假阳性诊断

Importance of Diagnostic Accuracy in Big Data: False-Positive Diagnoses of Type 2 Diabetes in Health Insurance Claims Data of 70 Million Germans.

作者信息

Brinks Ralph, Tönnies Thaddäus, Hoyer Annika

机构信息

Chair for Medical Biometry and Epidemiology, Faculty of Health, School of Medicine, Witten/Herdecke University, Witten, Germany.

Institute for Biometry and Epidemiology, German Diabetes Center, Düsseldorf, Germany.

出版信息

Front Epidemiol. 2022 May 23;2:887335. doi: 10.3389/fepid.2022.887335. eCollection 2022.

Abstract

Large data sets comprising diagnoses of chronic conditions are becoming increasingly available for research purposes. In Germany, it is planned that aggregated claims data - including medical diagnoses from the statutory health insurance - with roughly 70 million insurants will be published regularly. The validity of the diagnoses in such big datasets can hardly be assessed. In case the dataset comprises prevalence, incidence, and mortality, it is possible to estimate the proportion of false-positive diagnoses using mathematical relations from the illness-death model. We apply the method to age-specific aggregated claims data from 70 million Germans about type 2 diabetes in Germany stratified by sex and report the findings in terms of the age-specific ratio of false-positive diagnoses of type 2 diabetes (FPR) in the dataset. The FPR for men and women changes with age. In men, the FPR increases linearly from 1 to 3 per 1,000 in the age group of 30-50 years. For age between 50 and 80 years, FPR remains below 4 per 1,000. After 80 years of age, we have an increase to approximately 5 per 1,000. In women, we find a steep increase from age 30 to 60 years, the peak FPR is reached at approximately 12 per 1,000 between 60 and 70 years of age. After age 70 years, the FPR of women drops tremendously. In all age groups, the FPR is higher in women than in men. In terms of absolute numbers, we find that there are 217,000 people with a false-positive diagnosis in the dataset (95% confidence interval, CI: 204-229), the vast majority being women (172,000, 95% CI: 162-180). Our work indicates that possible false-positive (and negative) diagnoses should appropriately be dealt with in claims data, for example, by the inclusion of age- and sex-specific error terms in statistical models, to avoid potentially biased or wrong conclusions.

摘要

包含慢性病诊断信息的大型数据集正越来越多地用于研究目的。在德国,计划定期公布涵盖约7000万参保人的汇总理赔数据,其中包括法定医疗保险的医学诊断信息。在如此庞大的数据集中,诊断的有效性很难评估。如果数据集包含患病率、发病率和死亡率,可以使用疾病 - 死亡模型中的数学关系来估计假阳性诊断的比例。我们将该方法应用于来自7000万德国人的按性别分层的2型糖尿病年龄特异性汇总理赔数据,并根据数据集中2型糖尿病假阳性诊断的年龄特异性比例(FPR)报告研究结果。男性和女性的FPR随年龄变化。在男性中,30 - 50岁年龄组的FPR从每1000人1例线性增加到3例。在50至80岁之间,FPR保持在每1000人4例以下。80岁以后,FPR增加到约每1000人5例。在女性中,我们发现从30岁到60岁急剧增加,在60至70岁之间达到峰值FPR约为每1000人12例。7岁70岁以后,女性的FPR大幅下降。在所有年龄组中,女性的FPR高于男性。就绝对数字而言,我们发现在数据集中有21.7万人有假阳性诊断(95%置信区间,CI:20.4 - 22.9),绝大多数是女性(17.2万人,95%CI:16.2 - 18.0)。我们的研究表明,在理赔数据中应适当处理可能的假阳性(和阴性)诊断,例如,通过在统计模型中纳入年龄和性别特异性误差项,以避免潜在的有偏差或错误的结论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/01fa/10911003/f8990e30adf5/fepid-02-887335-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验