Suppr超能文献

深度神经网络在腕骨骨折检测中的批判性评估。

Critical evaluation of deep neural networks for wrist fracture detection.

机构信息

University of Oulu, Oulu, Finland.

Oulu University Hospital, Oulu, Finland.

出版信息

Sci Rep. 2021 Mar 16;11(1):6006. doi: 10.1038/s41598-021-85570-2.

Abstract

Wrist Fracture is the most common type of fracture with a high incidence rate. Conventional radiography (i.e. X-ray imaging) is used for wrist fracture detection routinely, but occasionally fracture delineation poses issues and an additional confirmation by computed tomography (CT) is needed for diagnosis. Recent advances in the field of Deep Learning (DL), a subfield of Artificial Intelligence (AI), have shown that wrist fracture detection can be automated using Convolutional Neural Networks. However, previous studies did not pay close attention to the difficult cases which can only be confirmed via CT imaging. In this study, we have developed and analyzed a state-of-the-art DL-based pipeline for wrist (distal radius) fracture detection-DeepWrist, and evaluated it against one general population test set, and one challenging test set comprising only cases requiring confirmation by CT. Our results reveal that a typical state-of-the-art approach, such as DeepWrist, while having a near-perfect performance on the general independent test set, has a substantially lower performance on the challenging test set-average precision of 0.99 (0.99-0.99) versus 0.64 (0.46-0.83), respectively. Similarly, the area under the ROC curve was of 0.99 (0.98-0.99) versus 0.84 (0.72-0.93), respectively. Our findings highlight the importance of a meticulous analysis of DL-based models before clinical use, and unearth the need for more challenging settings for testing medical AI systems.

摘要

腕部骨折是最常见的骨折类型,发病率较高。常规的影像学检查(即 X 射线成像)通常用于腕部骨折的检测,但偶尔会出现骨折边界不清晰的情况,需要进行计算机断层扫描(CT)进一步确认诊断。深度学习(DL)是人工智能(AI)的一个子领域,最近在腕部(桡骨远端)骨折检测方面取得了进展,可以使用卷积神经网络实现自动化。然而,之前的研究并没有密切关注那些只能通过 CT 成像来确认的困难病例。在本研究中,我们开发并分析了一种基于深度学习的腕部骨折检测的最新管道(DeepWrist),并将其与一个一般人群测试集和一个仅包含需要 CT 确认病例的挑战性测试集进行了评估。我们的结果表明,一种典型的最先进方法,如 DeepWrist,虽然在一般独立测试集上具有近乎完美的性能,但在具有挑战性的测试集上的性能要低得多——平均精度分别为 0.99(0.99-0.99)和 0.64(0.46-0.83)。同样,ROC 曲线下的面积分别为 0.99(0.98-0.99)和 0.84(0.72-0.93)。我们的研究结果强调了在临床使用前对基于深度学习的模型进行细致分析的重要性,并揭示了需要更具挑战性的设置来测试医疗 AI 系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e56/7971048/9f4a21215067/41598_2021_85570_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验