Suppr超能文献

在儿科疾病的年龄聚类中进行探索性无监督机器学习分析的交流。

Communicating exploratory unsupervised machine learning analysis in age clustering for paediatric disease.

机构信息

DRIVE, Great Ormond Street Hospital for Children, London, UK.

NIHR GOSH BRC, London, UK.

出版信息

BMJ Health Care Inform. 2024 Jul 29;31(1):e100963. doi: 10.1136/bmjhci-2023-100963.

Abstract

BACKGROUND

Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders.

METHODS

Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed.

FINDINGS

Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated.

CONCLUSION

Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.

摘要

背景

尽管电子医疗记录 (EHR) 数据的可用性不断增加,并且广泛提供了即插即用的机器学习 (ML) 应用程序编程接口,但迄今为止,在常规医院工作流程中采用数据驱动的决策仍然受到限制。通过按年龄得出诊断群集的角度,本研究调查了可以使用 EHR 数据执行的 ML 分析类型,以及如何将结果传达给非专业利益相关者。

方法

使用了经过预处理的来自三级儿科医院的观察性 EHR 数据,其中包含 61522 个唯一患者和 3315 个独特的 ICD-10 诊断代码。应用 K-均值聚类来识别患者诊断的年龄分布。最终模型使用定量指标和对群集临床有效性的专家评估进行选择。此外,还分析了预处理决策的不确定性。

结果

确定了四个疾病年龄群集,大致对应于以下年龄段:0 至 1 岁;1 至 5 岁;5 至 13 岁;13 至 18 岁。群集中的诊断与不同年龄段的出现倾向以及连续群集呈现出已知的疾病进展相关的现有知识相符。结果验证了文献中类似的方法。由预处理决策引起的不确定性的影响在个体诊断中很大,但在人群水平上则不然。成功演示了减轻或传达这种不确定性的策略。

结论

应用于 EHR 数据的无监督 ML 可以识别出具有临床相关性的诊断年龄分布,从而增强现有的决策制定。然而,如果不适当减轻或传达,医疗保健数据集内的偏差会极大地影响结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0a2/11288139/ef8d2dead54b/bmjhci-31-1-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验