Suppr超能文献

数据质量与自闭症:问题及潜在影响。

Data quality and autism: Issues and potential impacts.

作者信息

Heyl Johannes, Hardy Flavien, Tucker Katie, Hopper Adrian, Marchã Maria J, Liew Ashley, Reep Judith, Harwood Kerry-Anne, Roberts Luke, Yates Jeremy, Day Jamie, Wheeler Andrew, Eve-Jones Sue, Briggs Tim W R, Gray William K

机构信息

Getting It Right First Time, NHS England and NHS Improvement, London, UK; Department of Physics and Astronomy, University College London, London, UK.

Getting It Right First Time, NHS England and NHS Improvement, London, UK.

出版信息

Int J Med Inform. 2023 Feb;170:104938. doi: 10.1016/j.ijmedinf.2022.104938. Epub 2022 Nov 28.

Abstract

INTRODUCTION

Large healthcare datasets can provide insight that has the potential to improve outcomes for patients. However, it is important to understand the strengths and limitations of such datasets so that the insights they provide are accurate and useful. The aim of this study was to identify data inconsistencies within the Hospital Episodes Statistics (HES) dataset for autistic patients and assess potential biases introduced through these inconsistencies and their impact on patient outcomes. The study can only identify inconsistencies in recording of autism diagnosis and not whether the inclusion or exclusion of the autism diagnosis is the error.

METHODS

Data were extracted from the HES database for the period 1st April 2013 to 31st March 2021 for patients with a diagnosis of autism. First spells in hospital during the study period were identified for each patient and these were linked to any subsequent spell in hospital for the same patient. Data inconsistencies were recorded where autism was not recorded as a diagnosis in a subsequent spell. Features associated with data inconsistencies were identified using a random forest classifiers and regression modelling.

RESULTS

Data were available for 172,324 unique patients who had been recorded as having an autism diagnosis on first admission. In total, 43.7 % of subsequent spells were found to have inconsistencies. The features most strongly associated with inconsistencies included greater age, greater deprivation, longer time since the first spell, change in provider, shorter length of stay, being female and a change in the main specialty description. The random forest algorithm had an area under the receiver operating characteristic curve of 0.864 (95 % CI [0.862 - 0.866]) in predicting a data inconsistency. For patients who died in hospital, inconsistencies in their final spell were significantly associated with being 80 years and over, being female, greater deprivation and use of a palliative care code in the death spell.

CONCLUSIONS

Data inconsistencies in the HES database were relatively common in autistic patients and were associated a number of patient and hospital admission characteristics. Such inconsistencies have the potential to distort our understanding of service use in key demographic groups.

摘要

引言

大型医疗保健数据集能够提供有潜力改善患者治疗效果的见解。然而,了解此类数据集的优势和局限性很重要,这样它们所提供的见解才准确且有用。本研究的目的是识别自闭症患者的医院事件统计(HES)数据集中的数据不一致情况,并评估这些不一致所引入的潜在偏差及其对患者治疗效果的影响。该研究只能识别自闭症诊断记录中的不一致情况,而无法确定自闭症诊断的纳入或排除是否有误。

方法

从HES数据库中提取2013年4月1日至2021年3月31日期间诊断为自闭症的患者的数据。为每位患者确定研究期间首次住院的病程,并将其与该患者随后的任何住院病程相关联。当自闭症在随后的病程中未被记录为诊断时,记录数据不一致情况。使用随机森林分类器和回归模型识别与数据不一致相关的特征。

结果

共有172324名首次入院时被记录为患有自闭症诊断的独特患者的数据可用。总共发现43.7%的后续病程存在不一致情况。与不一致情况最密切相关的特征包括年龄较大、贫困程度较高、首次病程后时间较长、提供者变更、住院时间较短、女性以及主要专科描述变更。随机森林算法在预测数据不一致方面的受试者工作特征曲线下面积为0.864(95%CI[0.862 - 0.866])。对于在医院死亡的患者,其最后病程中的不一致情况与80岁及以上、女性、贫困程度较高以及死亡病程中使用姑息治疗代码显著相关。

结论

HES数据库中的数据不一致情况在自闭症患者中相对常见,并且与许多患者和住院特征相关。此类不一致情况有可能扭曲我们对关键人口群体服务使用情况的理解。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验