Department of Radiology and Diagnostic Imaging, University of Alberta, Alberta, Canada.
MEDO.Ai Inc, Singapore, Singapore.
J Ultrasound. 2022 Jun;25(2):145-153. doi: 10.1007/s40477-021-00560-4. Epub 2021 Mar 5.
Early diagnosis of developmental dysplasia of the hip (DDH) using ultrasound (US) is safe, effective and inexpensive, but requires high-quality scans. The effect of scan quality on diagnostic accuracy is not well understood, especially as artificial intelligence (AI) begins to automate such diagnosis. In this paper, we developed a 10-point scoring system for reporting DDH US scan quality, evaluated its inter-rater agreement and examined its effect on automated assessment by an AI system-MEDO-Hip.
Scoring was based on iliac wing straightness and angulation; visibility of labrum, os ischium and femoral head; motion; and other artifacts. Four readers from novice to expert separately scored the quality of 107 scans with this 10-point scale and with holistic grading on a scale of 1-5. MEDO-Hip interpreted the same scans, providing a diagnostic category or identifying the scan as uninterpretable.
Inter-rater agreement for the 10-point scale was significantly higher than holistic scoring ICC 0.68 vs 0.93, p < 0.05. Inter-rater agreement on the categorisation of individual features, by Cohen's kappa, was highest for os ischium (0.67 ± 0.06), femoral head (0.65 ± 0.07) and iliac wing (0.49 ± 0.12) indices, and lower for the presence of labrum (0.21 ± 0.19). MEDO-Hip interpreted all images of a quality > 7 and flagged 13/107 as uninterpretable. These were low-quality images (3 ± 1.2 vs. 7 ± 1.8 in others, p < 0.05), with poor visualization of the os ischium and noticeable motion. AI accuracy in cases with quality scores < = 7 was 57% vs. 89% on other cases, p < 0.01.
This study validates that our scoring system reliably characterises scan quality, and identifies cases likely to be misinterpreted by AI. This could lead to more accurate use of AI in DDH diagnosis by flagging low-quality scans likely to provide poor diagnosis up front.
使用超声(US)早期诊断发育性髋关节发育不良(DDH)安全、有效且价格低廉,但需要高质量的扫描。扫描质量对诊断准确性的影响尚不清楚,尤其是随着人工智能(AI)开始自动进行此类诊断。在本文中,我们开发了一种用于报告 DDH US 扫描质量的 10 分评分系统,评估了其内部评分者之间的一致性,并检查了其对 AI 系统-MEDO-Hip 自动评估的影响。
评分基于髂骨翼的平直度和角度;髋臼唇、坐骨和股骨头的可见性;运动;和其他伪影。四位从新手到专家的读者分别使用 10 分制和 1-5 分制的整体评分对 107 个扫描进行了质量评分。MEDO-Hip 解释了相同的扫描,提供了一个诊断类别或确定扫描不可解释。
10 分制的内部评分者之间的一致性明显高于整体评分 ICC 0.68 对 0.93,p < 0.05。通过 Cohen's kappa 对个别特征的分类进行内部评分者之间的一致性,坐骨(0.67 ± 0.06)、股骨头(0.65 ± 0.07)和髂骨翼(0.49 ± 0.12)指数最高,髋臼唇(0.21 ± 0.19)较低。MEDO-Hip 解释了所有质量>7 的图像,并将 13/107 标记为不可解释。这些是低质量图像(3 ± 1.2 与其他图像的 7 ± 1.8,p < 0.05),坐骨可视化较差且运动明显。质量评分< = 7 的病例中 AI 的准确性为 57%,而其他病例为 89%,p < 0.01。
本研究验证了我们的评分系统能够可靠地描述扫描质量,并确定了可能被 AI 错误解释的病例。这可能会导致 AI 在 DDH 诊断中的更准确使用,通过提前标记可能提供较差诊断的低质量扫描来实现。