Suppr超能文献

深度学习模型用于小儿肘部X光片二项式分类的应用:初步经验、性能及经验教训

Use of deep learning model for paediatric elbow radiograph binomial classification: initial experience, performance and lessons learnt.

作者信息

Tan Mark Bangwei, Chua Yuezhi Russ, Fan Qiao, Fortier Marielle Valerie, Chang Peiqi Pearlly

机构信息

Department of Diagnostic Radiology, Singapore General Hospital, Singapore.

Agency for Science, Technology and Research, Singapore.

出版信息

Singapore Med J. 2025 Apr 1;66(4):208-214. doi: 10.4103/singaporemedj.SMJ-2022-078. Epub 2023 Nov 29.

Abstract

INTRODUCTION

In this study, we aimed to compare the performance of a convolutional neural network (CNN)-based deep learning model that was trained on a dataset of normal and abnormal paediatric elbow radiographs with that of paediatric emergency department (ED) physicians on a binomial classification task.

METHODS

A total of 1,314 paediatric elbow lateral radiographs (patient mean age 8.2 years) were retrospectively retrieved and classified based on annotation as normal or abnormal (with pathology). They were then randomly partitioned to a development set (993 images); first and second tuning (validation) sets (109 and 100 images, respectively); and a test set (112 images). An artificial intelligence (AI) model was trained on the development set using the EfficientNet B1 network architecture. Its performance on the test set was compared to that of five physicians (inter-rater agreement: fair). Performance of the AI model and the physician group was tested using McNemar test.

RESULTS

The accuracy of the AI model on the test set was 80.4% (95% confidence interval [CI] 71.8%-87.3%), and the area under the receiver operating characteristic curve (AUROC) was 0.872 (95% CI 0.831-0.947). The performance of the AI model vs. the physician group on the test set was: sensitivity 79.0% (95% CI: 68.4%-89.5%) vs. 64.9% (95% CI: 52.5%-77.3%; P = 0.088); and specificity 81.8% (95% CI: 71.6%-92.0%) vs. 87.3% (95% CI: 78.5%-96.1%; P = 0.439).

CONCLUSION

The AI model showed good AUROC values and higher sensitivity, with the P-value at nominal significance when compared to the clinician group.

摘要

引言

在本研究中,我们旨在比较一个基于卷积神经网络(CNN)的深度学习模型与儿科急诊科(ED)医生在二项分类任务中的表现,该深度学习模型是在正常和异常儿科肘部X光片数据集上进行训练的。

方法

回顾性检索了总共1314张儿科肘部侧位X光片(患者平均年龄8.2岁),并根据注释分为正常或异常(有病变)两类。然后将它们随机划分为一个开发集(993张图像)、第一和第二调优(验证)集(分别为109张和100张图像)以及一个测试集(112张图像)。使用EfficientNet B1网络架构在开发集上训练了一个人工智能(AI)模型。将其在测试集上的表现与五名医生的表现进行比较(评分者间一致性:一般)。使用McNemar检验测试AI模型和医生组的表现。

结果

AI模型在测试集上的准确率为80.4%(95%置信区间[CI]71.8%-87.3%),受试者操作特征曲线下面积(AUROC)为0.872(95%CI 0.831-0.947)。AI模型与医生组在测试集上的表现为:灵敏度79.0%(95%CI:68.4%-89.5%)对64.9%(95%CI:52.5%-77.3%;P = 0.088);特异性81.8%(95%CI:71.6%-92.0%)对87.3%(95%CI:78.5%-96.1%;P = 0.439)。

结论

AI模型显示出良好的AUROC值和更高的灵敏度,与临床医生组相比,P值具有名义显著性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/04a3/12063939/55f0505f673a/SMJ-66-208-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验