Shelmerdine Susan C, White Richard D, Liu Hantao, Arthurs Owen J, Sebire Neil J
Department of Clinical Radiology, Great Ormond Street Hospital for Children, London, UK.
Great Ormond Street Hospital for Children, UCL Great Ormond Street Institute of Child Health, London, UK.
Insights Imaging. 2022 Jun 3;13(1):94. doi: 10.1186/s13244-022-01234-3.
Majority of research and commercial efforts have focussed on use of artificial intelligence (AI) for fracture detection in adults, despite the greater long-term clinical and medicolegal implications of missed fractures in children. The objective of this study was to assess the available literature regarding diagnostic performance of AI tools for paediatric fracture assessment on imaging, and where available, how this compares with the performance of human readers.
MEDLINE, Embase and Cochrane Library databases were queried for studies published between 1 January 2011 and 2021 using terms related to 'fracture', 'artificial intelligence', 'imaging' and 'children'. Risk of bias was assessed using a modified QUADAS-2 tool. Descriptive statistics for diagnostic accuracies were collated.
Nine eligible articles from 362 publications were included, with most (8/9) evaluating fracture detection on radiographs, with the elbow being the most common body part. Nearly all articles used data derived from a single institution, and used deep learning methodology with only a few (2/9) performing external validation. Accuracy rates generated by AI ranged from 88.8 to 97.9%. In two of the three articles where AI performance was compared to human readers, sensitivity rates for AI were marginally higher, but this was not statistically significant.
Wide heterogeneity in the literature with limited information on algorithm performance on external datasets makes it difficult to understand how such tools may generalise to a wider paediatric population. Further research using a multicentric dataset with real-world evaluation would help to better understand the impact of these tools.
尽管儿童骨折漏诊具有更大的长期临床和法医学影响,但大多数研究和商业努力都集中在利用人工智能(AI)进行成人骨折检测上。本研究的目的是评估关于人工智能工具在影像学上评估儿童骨折诊断性能的现有文献,以及在可行的情况下,将其与人类阅片者的性能进行比较。
检索MEDLINE、Embase和Cochrane图书馆数据库,查找2011年1月1日至2021年期间发表的与“骨折”、“人工智能”、“影像学”和“儿童”相关的研究。使用改良的QUADAS-2工具评估偏倚风险。整理诊断准确性的描述性统计数据。
从362篇出版物中纳入了9篇符合条件的文章,其中大多数(8/9)评估了X线片上的骨折检测,肘部是最常见的身体部位。几乎所有文章都使用了来自单一机构的数据,并采用深度学习方法,只有少数(2/9)进行了外部验证。人工智能生成的准确率在88.8%至97.9%之间。在将人工智能性能与人类阅片者进行比较的三篇文章中的两篇中,人工智能的敏感度略高,但无统计学意义。
文献中存在广泛的异质性,关于外部数据集上算法性能的信息有限,这使得难以理解此类工具如何推广到更广泛的儿科人群。使用具有真实世界评估的多中心数据集进行进一步研究将有助于更好地理解这些工具的影响。