Getting It Right First Time, NHS England and NHS Improvement London, London, UK
Department of Physics and Astronomy, University College London, London, UK.
BMJ Health Care Inform. 2022 Oct;29(1). doi: 10.1136/bmjhci-2022-100633.
To gain maximum insight from large administrative healthcare datasets it is important to understand their data quality. Although a gold standard against which to assess criterion validity rarely exists for such datasets, internal consistency can be evaluated. We aimed to identify inconsistencies in the recording of mandatory International Statistical Classification of Diseases and Related Health Problems, tenth revision (ICD-10) codes within the Hospital Episodes Statistics dataset in England.
Three exemplar medical conditions where recording is mandatory once diagnosed were chosen: autism, type II diabetes mellitus and Parkinson's disease dementia. We identified the first occurrence of the condition ICD-10 code for a patient during the period April 2013 to March 2021 and in subsequent hospital spells. We designed and trained random forest classifiers to identify variables strongly associated with recording inconsistencies.
For autism, diabetes and Parkinson's disease dementia respectively, 43.7%, 8.6% and 31.2% of subsequent spells had inconsistencies. Coding inconsistencies were highly correlated with non-coding of an underlying condition, a change in hospital trust and greater time between the spell with the first coded diagnosis and the subsequent spell. For patients with diabetes or Parkinson's disease dementia, the code recording for spells without an overnight stay were found to have a higher rate of inconsistencies.
Data inconsistencies are relatively common for the three conditions considered. Where these mandatory diagnoses are not recorded in administrative datasets, and where clinical decisions are made based on such data, there is potential for this to impact patient care.
为了从大型医疗保健管理数据集获得最大的见解,了解其数据质量非常重要。尽管对于此类数据集,很少存在评估准则有效性的金标准,但可以评估内部一致性。我们旨在确定英格兰医院住院统计数据集中强制性国际疾病分类第十版(ICD-10)代码记录中的不一致之处。
选择了三种记录是强制性的示例医学病症:自闭症、2 型糖尿病和帕金森病痴呆症。我们确定了患者在 2013 年 4 月至 2021 年 3 月期间首次出现该病症的 ICD-10 代码,以及随后的住院记录。我们设计并培训了随机森林分类器来识别与记录不一致性强相关的变量。
对于自闭症、糖尿病和帕金森病痴呆症,分别有 43.7%、8.6%和 31.2%的后续记录存在不一致。编码不一致与未记录潜在病症、医院信托变更以及记录首次编码诊断与后续记录之间的时间间隔较长高度相关。对于没有过夜记录的糖尿病或帕金森病痴呆症患者,发现其记录代码的不一致率更高。
在所考虑的三种病症中,数据不一致相对常见。如果这些强制性诊断未在管理数据集内记录,并且临床决策基于此类数据做出,则可能会影响患者的护理。