• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

测试集构成对X线小儿腕部骨折检测中人工智能性能的影响。

Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.

作者信息

Till Tristan, Scherkl Mario, Stranger Nikolaus, Singer Georg, Hankel Saskia, Flucher Christina, Hržić Franko, Štajduhar Ivan, Tschauner Sebastian

机构信息

Division of Pediatric Radiology, Department of Radiology, Medical University of Graz, Auenbruggerplatz 34, Graz, 8036, Austria.

Department of Pediatric and Adolescent Surgery, Medical University of Graz, Auenbruggerplatz 34, Graz, 8036, Austria.

出版信息

Eur Radiol. 2025 May 16. doi: 10.1007/s00330-025-11669-z.

DOI:10.1007/s00330-025-11669-z
PMID:40379941
Abstract

OBJECTIVES

To evaluate how different test set sampling strategies-random selection and balanced sampling-affect the performance of artificial intelligence (AI) models in pediatric wrist fracture detection using radiographs, aiming to highlight the need for standardization in test set design.

MATERIALS AND METHODS

This retrospective study utilized the open-sourced GRAZPEDWRI-DX dataset of 6091 pediatric wrist radiographs. Two test sets, each containing 4588 images, were constructed: one using a balanced approach based on case difficulty, projection type, and fracture presence and the other a random selection. EfficientNet and YOLOv11 models were trained and validated on 18,762 radiographs and tested on both sets. Binary classification and object detection tasks were evaluated using metrics such as precision, recall, F1 score, AP50, and AP50-95. Statistical comparisons between test sets were performed using nonparametric tests.

RESULTS

Performance metrics significantly decreased in the balanced test set with more challenging cases. For example, the precision for YOLOv11 models decreased from 0.95 in the random set to 0.83 in the balanced set. Similar trends were observed for recall, accuracy, and F1 score, indicating that models trained on easy-to-recognize cases performed poorly on more complex ones. These results were consistent across all model variants tested.

CONCLUSION

AI models for pediatric wrist fracture detection exhibit reduced performance when tested on balanced datasets containing more difficult cases, compared to randomly selected cases. This highlights the importance of constructing representative and standardized test sets that account for clinical complexity to ensure robust AI performance in real-world settings.

KEY POINTS

Question Do different sampling strategies based on samples' complexity have an influence in deep learning models' performance in fracture detection? Findings AI performance in pediatric wrist fracture detection significantly drops when tested on balanced datasets with more challenging cases, compared to randomly selected cases. Clinical relevance Without standardized and validated test datasets for AI that reflect clinical complexities, performance metrics may be overestimated, limiting the utility of AI in real-world settings.

摘要

目的

评估不同的测试集抽样策略——随机选择和平衡抽样——如何影响使用X光片进行小儿手腕骨折检测的人工智能(AI)模型的性能,旨在强调测试集设计标准化的必要性。

材料与方法

这项回顾性研究使用了包含6091张小儿童手腕X光片的开源GRAZPEDWRI-DX数据集。构建了两个测试集,每个测试集包含4588张图像:一个基于病例难度、投影类型和骨折情况采用平衡方法构建,另一个采用随机选择。EfficientNet和YOLOv11模型在18762张X光片上进行训练和验证,并在两个测试集上进行测试。使用精度、召回率、F1分数、AP50和AP50-95等指标评估二分类和目标检测任务。使用非参数检验对测试集之间进行统计比较。

结果

在包含更具挑战性病例的平衡测试集中,性能指标显著下降。例如,YOLOv11模型的精度从随机测试集中的0.95降至平衡测试集中的0.83。在召回率、准确率和F1分数方面也观察到类似趋势,表明在易于识别的病例上训练的模型在更复杂的病例上表现不佳。这些结果在所有测试的模型变体中都是一致的。

结论

与随机选择的病例相比,在包含更困难病例的平衡数据集上进行测试时,用于小儿手腕骨折检测的AI模型性能会降低。这凸显了构建考虑临床复杂性的代表性和标准化测试集的重要性,以确保AI在实际应用中的稳健性能。

关键点

问题基于样本复杂性的不同抽样策略是否会影响深度学习模型在骨折检测中的性能?研究结果与随机选择的病例相比,在包含更具挑战性病例的平衡数据集上进行测试时,小儿手腕骨折检测中的AI性能显著下降。临床意义如果没有反映临床复杂性的标准化和经过验证的AI测试数据集,性能指标可能会被高估,从而限制AI在实际应用中的效用。

相似文献

1
Impact of test set composition on AI performance in pediatric wrist fracture detection in X-rays.测试集构成对X线小儿腕部骨折检测中人工智能性能的影响。
Eur Radiol. 2025 May 16. doi: 10.1007/s00330-025-11669-z.
2
Detecting pediatric wrist fractures using deep-learning-based object detection.基于深度学习的目标检测技术在小儿腕骨骨折中的应用
Pediatr Radiol. 2023 May;53(6):1125-1134. doi: 10.1007/s00247-023-05588-8. Epub 2023 Jan 18.
3
Suppression of immobilisation device on wrist radiography to improve fracture visualisation.在腕部X线摄影中抑制固定装置以改善骨折显影。
Eur Radiol. 2025 Jun;35(6):3418-3428. doi: 10.1007/s00330-024-11232-2. Epub 2024 Dec 3.
4
Real-life benefit of artificial intelligence-based fracture detection in a pediatric emergency department.人工智能辅助骨折检测在儿科急诊科的实际应用价值
Eur Radiol. 2025 Apr 7. doi: 10.1007/s00330-025-11554-9.
5
AI-based X-ray fracture analysis of the distal radius: accuracy between representative classification, detection and segmentation deep learning models for clinical practice.基于人工智能的桡骨远端 X 射线骨折分析:代表性分类、检测和分割深度学习模型在临床实践中的准确性。
BMJ Open. 2024 Jan 23;14(1):e076954. doi: 10.1136/bmjopen-2023-076954.
6
Artificial intelligence versus radiologist in the accuracy of fracture detection based on computed tomography images: a multi-dimensional, multi-region analysis.基于计算机断层扫描图像的骨折检测准确性:人工智能与放射科医生的多维、多区域分析
Quant Imaging Med Surg. 2023 Oct 1;13(10):6424-6433. doi: 10.21037/qims-23-428. Epub 2023 Sep 4.
7
Development and optimization of AI algorithms for wrist fracture detection in children using a freely available dataset.利用一个免费数据集开发并优化用于儿童手腕骨折检测的人工智能算法。
Front Pediatr. 2023 Dec 21;11:1291804. doi: 10.3389/fped.2023.1291804. eCollection 2023.
8
Artificial intelligence vs. radiologist: accuracy of wrist fracture detection on radiographs.人工智能与放射科医生:X 光片检测腕骨骨折的准确性。
Eur Radiol. 2023 Jun;33(6):3974-3983. doi: 10.1007/s00330-022-09349-3. Epub 2022 Dec 14.
9
The ensemble artificial intelligence (AI) method: Detection of hip fractures in AP pelvis plain radiographs by majority voting using a multi-center dataset.集成人工智能(AI)方法:使用多中心数据集通过多数投票在前后位骨盆平片上检测髋部骨折
Digit Health. 2023 Nov 28;9:20552076231216549. doi: 10.1177/20552076231216549. eCollection 2023 Jan-Dec.
10
Comparison of Chest Radiograph Interpretations by Artificial Intelligence Algorithm vs Radiology Residents.人工智能算法与放射科住院医师对胸部 X 线片解读的比较。
JAMA Netw Open. 2020 Oct 1;3(10):e2022779. doi: 10.1001/jamanetworkopen.2020.22779.

本文引用的文献

1
Revolutionizing breast ultrasound diagnostics with EfficientNet-B7 and Explainable AI.利用 EfficientNet-B7 和可解释 AI 技术革新乳腺超声诊断。
BMC Med Imaging. 2024 Sep 2;24(1):230. doi: 10.1186/s12880-024-01404-3.
2
Artificial intelligence-based detection of paediatric appendicular skeletal fractures: performance and limitations for common fracture types and locations.基于人工智能的儿科四肢骨骼骨折检测:常见骨折类型和部位的性能和局限性。
Pediatr Radiol. 2024 Jan;54(1):136-145. doi: 10.1007/s00247-023-05822-3. Epub 2023 Dec 15.
3
Assessing the Potential of a Deep Learning Tool to Improve Fracture Detection by Radiologists and Emergency Physicians on Extremity Radiographs.
评估深度学习工具在提高放射科医生和急诊医生对手部和足部 X 光片骨折检测能力的潜力。
Acad Radiol. 2024 May;31(5):1989-1999. doi: 10.1016/j.acra.2023.10.042. Epub 2023 Nov 22.
4
Artificial Intelligence for Detecting Acute Fractures in Patients Admitted to an Emergency Department: Real-Life Performance of Three Commercial Algorithms.人工智能在检测急诊科就诊患者急性骨折中的应用:三种商业算法的真实表现。
Acad Radiol. 2023 Oct;30(10):2118-2139. doi: 10.1016/j.acra.2023.06.016. Epub 2023 Jul 18.
5
Comparison of diagnostic performance of a deep learning algorithm, emergency physicians, junior radiologists and senior radiologists in the detection of appendicular fractures in children.深度学习算法、急诊医师、低年资放射科医师和高 年资放射科医师在儿童四肢骨折检测中的诊断性能比较。
Pediatr Radiol. 2023 Jul;53(8):1675-1684. doi: 10.1007/s00247-023-05621-w. Epub 2023 Mar 6.
6
Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists.资深和初级放射科医生对一种用于检测儿童和年轻成人四肢骨骼骨折的人工智能辅助工具的评估。
Pediatr Radiol. 2022 Oct;52(11):2215-2226. doi: 10.1007/s00247-022-05496-3. Epub 2022 Sep 28.
7
Potential applications and performance of machine learning techniques and algorithms in clinical practice: A systematic review.机器学习技术和算法在临床实践中的潜在应用和性能:系统评价。
Int J Med Inform. 2022 Mar;159:104679. doi: 10.1016/j.ijmedinf.2021.104679. Epub 2021 Dec 31.
8
Improving Radiographic Fracture Recognition Performance and Efficiency Using Artificial Intelligence.利用人工智能提高放射科骨折识别性能和效率。
Radiology. 2022 Mar;302(3):627-636. doi: 10.1148/radiol.210937. Epub 2021 Dec 21.
9
Performance of machine learning algorithms for glioma segmentation of brain MRI: a systematic literature review and meta-analysis.基于脑 MRI 的脑胶质瘤分割的机器学习算法性能:系统文献回顾和荟萃分析。
Eur Radiol. 2021 Dec;31(12):9638-9653. doi: 10.1007/s00330-021-08035-0. Epub 2021 May 21.
10
Assessing Radiology Research on Artificial Intelligence: A Brief Guide for Authors, Reviewers, and Readers-From the Editorial Board.评估人工智能放射学研究:给作者、审稿人和读者的简要指南——来自编辑委员会
Radiology. 2020 Mar;294(3):487-489. doi: 10.1148/radiol.2019192515. Epub 2019 Dec 31.