Zucker K, McInerney C, Glaser A, Baxter P, Hall G
Leeds Institute of Data Analytics, University of Leeds, Leeds, UK.
School of Medicine, University of Leeds, Leeds, UK.
Br J Cancer. 2025 Aug 9. doi: 10.1038/s41416-025-03136-9.
Significant volumes of research rely on secondary care diagnostic coding to identify comorbidities however little is known about its accuracy at a population level or if this influences subsequent analysis.
Retrospective observational study utilising real world data for all cancers, prostate cancer and breast cancer patients diagnosed at Leeds Cancer Centre from 2005 and 2018. Three different data definitions were used to identify patients with diabetes in each cohort: (1) clinical coding alone, (2) HbA1c blood test alone (3) either clinical coding or abnormal HbA1c. Cohort characteristics, diagnosis dates and Cox derived survival was compared across diabetes definitions.
123,841 cancer patients were identified including 13,964 with diabetes. Clinical coding failed to identify 14.6% of diabetic cancer patients with a temporal misclassification rate of 17.5%. Sole reliance on clinical coding overestimated the negative effect of DM on median survival across all cancers and 3.17 years in breast cancer.
Clinical coding provides inaccurate diabetes diagnosis date and detection resulting in meaningful differences in analytic outcomes. This supports the use of more detailed comorbidity data definitions. Results casts doubt over research reliant on hospital clinical coding alone and the generalisability of some comorbidity and frailty scoring systems.
大量研究依赖二级医疗诊断编码来识别合并症,但对于其在人群层面的准确性或这是否会影响后续分析,我们知之甚少。
采用回顾性观察研究,利用2005年至2018年在利兹癌症中心诊断的所有癌症、前列腺癌和乳腺癌患者的真实世界数据。在每个队列中使用三种不同的数据定义来识别糖尿病患者:(1)仅临床编码,(2)仅糖化血红蛋白(HbA1c)血液检测,(3)临床编码或异常HbA1c。比较不同糖尿病定义下的队列特征、诊断日期和Cox衍生生存率。
共识别出123,841名癌症患者,其中13,964名患有糖尿病。临床编码未能识别出14.6%的糖尿病癌症患者,时间错误分类率为17.5%。仅依赖临床编码高估了糖尿病对所有癌症患者中位生存期的负面影响,在乳腺癌患者中高估了3.17年。
临床编码提供的糖尿病诊断日期和检测结果不准确,导致分析结果出现显著差异。这支持使用更详细的合并症数据定义。研究结果对仅依赖医院临床编码的研究以及一些合并症和虚弱评分系统的普遍性提出了质疑。