Suppr超能文献

人工智能多标签深度学习模型对踝关节骨折分类的外部验证。

External validation of an artificial intelligence multi-label deep learning model capable of ankle fracture classification.

机构信息

Danderyd University Hospital, Karolinska Institute, Stockholm, Sweden.

Flinders University and Medical Centre, Adelaide, South Australia, Australia.

出版信息

BMC Musculoskelet Disord. 2024 Oct 4;25(1):788. doi: 10.1186/s12891-024-07884-2.

Abstract

BACKGROUND

Advances in medical imaging have made it possible to classify ankle fractures using Artificial Intelligence (AI). Recent studies have demonstrated good internal validity for machine learning algorithms using the AO/OTA 2018 classification. This study aimed to externally validate one such model for ankle fracture classification and ways to improve external validity.

METHODS

In this retrospective observation study, we trained a deep-learning neural network (7,500 ankle studies) to classify traumatic malleolar fractures according to the AO/OTA classification. Our internal validation dataset (IVD) contained 409 studies collected from Danderyd Hospital in Stockholm, Sweden, between 2002 and 2016. The external validation dataset (EVD) contained 399 studies collected from Flinders Medical Centre, Adelaide, Australia, between 2016 and 2020. Our primary outcome measures were the area under the receiver operating characteristic (AUC) and the area under the precision-recall curve (AUPR) for fracture classification of AO/OTA malleolar (44) fractures. Secondary outcomes were performance on other fractures visible on ankle radiographs and inter-observer reliability of reviewers.

RESULTS

Compared to the weighted mean AUC (wAUC) 0.86 (95%CI 0.82-0.89) for fracture detection in the EVD, the network attained wAUC 0.95 (95%CI 0.94-0.97) for the IVD. The area under the precision-recall curve (AUPR) was 0.93 vs. 0.96. The wAUC for individual outcomes (type 44A-C, group 44A1-C3, and subgroup 44A1.1-C3.3) was 0.82 for the EVD and 0.93 for the IVD. The weighted mean AUPR (wAUPR) was 0.59 vs 0.63. Throughout, the performance was superior to that of a random classifier for the EVD.

CONCLUSION

Although the two datasets had considerable differences, the model transferred well to the EVD and the alternative clinical scenario it represents. The direct clinical implications of this study are that algorithms developed elsewhere need local validation and that discrepancies can be rectified using targeted training. In a wider sense, we believe this opens up possibilities for building advanced treatment recommendations based on exact fracture types that are more objective than current clinical decisions, often influenced by who is present during rounds.

摘要

背景

医学影像学的进步使得使用人工智能(AI)对踝关节骨折进行分类成为可能。最近的研究表明,使用 AO/OTA 2018 分类的机器学习算法具有良好的内部有效性。本研究旨在对一种用于踝关节骨折分类的模型进行外部验证,并探讨提高外部有效性的方法。

方法

在这项回顾性观察研究中,我们使用深度学习神经网络(7500 例踝关节研究)根据 AO/OTA 分类对创伤性外踝骨折进行分类。我们的内部验证数据集(IVD)包含了 2002 年至 2016 年期间从瑞典斯德哥尔摩 Danderyd 医院收集的 409 例研究。外部验证数据集(EVD)包含了 2016 年至 2020 年期间从澳大利亚阿德莱德弗林德斯医疗中心收集的 399 例研究。我们的主要结局指标是踝关节 X 线片上可见骨折的接受者操作特征曲线下面积(AUC)和精度-召回曲线下面积(AUPR)。次要结局指标是其他骨折的表现和观察者间的可靠性。

结果

与 EVD 中加权平均 AUC(wAUC)0.86(95%CI 0.82-0.89)相比,网络在 IVD 中达到了 0.95(95%CI 0.94-0.97)。精确-召回曲线下面积(AUPR)为 0.93 比 0.96。个别结果(44A-C 型、44A1-C3 组和 44A1.1-C3.3 亚组)的 wAUC 为 0.82 为 EVD,0.93 为 IVD。加权平均 AUPR(wAUPR)为 0.59 比 0.63。总体而言,该模型在 EVD 中的表现优于随机分类器。

结论

尽管两个数据集存在较大差异,但该模型在 EVD 及其代表的替代临床环境中表现良好。本研究的直接临床意义是,其他地方开发的算法需要进行本地验证,并且可以通过针对性的训练来纠正差异。从更广泛的意义上讲,我们相信这为基于比当前临床决策更客观的精确骨折类型构建先进的治疗建议提供了可能性,当前的临床决策通常受轮班时在场的人员影响。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验