Department of Ultrasound, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, People's Republic of China.
School of Information Sciences and Technology, Northwest University, Xi'an, Shaanxi Province, People's Republic of China.
Rheumatology (Oxford). 2024 Mar 1;63(3):866-873. doi: 10.1093/rheumatology/kead366.
We aimed to investigate the value of deep learning (DL) models based on multimodal ultrasonographic (US) images to quantify RA activity.
Static greyscale (SGS), dynamic greyscale (DGS), static power Doppler (SPD) and dynamic power Doppler (DPD) US images were collected and evaluated by two expert radiologists according to the EULAR-OMERACT Synovitis Scoring system. Four DL models were developed based on the ResNet-type structure, evaluated on two separate test cohorts, and finally compared with the performance of 12 radiologists with different levels of experience.
In total, 1244 images were used for the model training, and 152 and 354 for testing (cohort 1 and 2, respectively). The best-performing models for the scores of 0/1/2/3 were the DPD, SGS, DGS and SPD models, respectively (Area Under the receiver operating characteristic Curve [AUC] = 0.87/0.95/0.74/0.95; no significant differences). All the DL models provided results comparable to the experienced radiologists on a per-image basis (intraclass correlation coefficient: 0.239-0.756, P < 0.05). The SPD model performed better than the SGS one on test cohort 1 (score of 0/2/3: AUC = 0.82/0.67/0.95 vs 0.66/0.66/0.75, respectively) and test cohort 2 (score of 0: AUC = 0.89 vs 0.81). The dynamic DL models performed better than the static ones in most of the scoring processes and were more accurate than the most of senior radiologists, especially the DPD model.
DL models based on multimodal US images allow a quantitative and objective assessment of RA activity. Dynamic DL models in particular have potential value in assisting radiologists to improve the accuracy of RA US-based grading.
我们旨在研究基于多模态超声(US)图像的深度学习(DL)模型在量化 RA 活动中的价值。
根据 EULAR-OMERACT 滑膜炎评分系统,由两名专家放射科医生对静态灰阶(SGS)、动态灰阶(DGS)、静态功率多普勒(SPD)和动态功率多普勒(DPD)US 图像进行采集和评估。基于 ResNet 型结构开发了四个 DL 模型,在两个独立的测试队列中进行评估,最后与不同经验水平的 12 名放射科医生的表现进行比较。
共使用 1244 幅图像进行模型训练,152 幅和 354 幅用于测试(队列 1 和 2)。DPD、SGS、DGS 和 SPD 模型在 0/1/2/3 评分方面的表现最佳(受试者工作特征曲线下面积 [AUC]分别为 0.87/0.95/0.74/0.95;无显著差异)。所有 DL 模型在基于图像的基础上都能提供与有经验的放射科医生相当的结果(组内相关系数:0.239-0.756,P<0.05)。在测试队列 1 中,SPD 模型在 0/2/3 评分方面的表现优于 SGS 模型(AUC 分别为 0.82/0.67/0.95 与 0.66/0.66/0.75),在测试队列 2 中(AUC 分别为 0.89 与 0.81)。在大多数评分过程中,动态 DL 模型的表现优于静态模型,并且比大多数高级放射科医生更准确,尤其是 DPD 模型。
基于多模态 US 图像的 DL 模型可实现 RA 活动的定量和客观评估。特别是动态 DL 模型在帮助放射科医生提高基于 US 的 RA 分级准确性方面具有潜在价值。