Department of Radiology, Stanford University, Palo Alto, California, USA.
Electrical Engineering, Stanford University, Palo Alto, California, USA.
J Magn Reson Imaging. 2023 Apr;57(4):1029-1039. doi: 10.1002/jmri.28365. Epub 2022 Jul 19.
Deep learning (DL)-based automatic segmentation models can expedite manual segmentation yet require resource-intensive fine-tuning before deployment on new datasets. The generalizability of DL methods to new datasets without fine-tuning is not well characterized.
Evaluate the generalizability of DL-based models by deploying pretrained models on independent datasets varying by MR scanner, acquisition parameters, and subject population.
Retrospective based on prospectively acquired data.
Overall test dataset: 59 subjects (26 females); Study 1: 5 healthy subjects (zero females), Study 2: 8 healthy subjects (eight females), Study 3: 10 subjects with osteoarthritis (eight females), Study 4: 36 subjects with various knee pathology (10 females).
FIELD STRENGTH/SEQUENCE: A 3-T, quantitative double-echo steady state (qDESS).
Four annotators manually segmented knee cartilage. Each reader segmented one of four qDESS datasets in the test dataset. Two DL models, one trained on qDESS data and another on Osteoarthritis Initiative (OAI)-DESS data, were assessed. Manual and automatic segmentations were compared by quantifying variations in segmentation accuracy, volume, and T2 relaxation times for superficial and deep cartilage.
Dice similarity coefficient (DSC) for segmentation accuracy. Lin's concordance correlation coefficient (CCC), Wilcoxon rank-sum tests, root-mean-squared error-coefficient-of-variation to quantify manual vs. automatic T2 and volume variations. Bland-Altman plots for manual vs. automatic T2 agreement. A P value < 0.05 was considered statistically significant.
DSCs for the qDESS-trained model, 0.79-0.93, were higher than those for the OAI-DESS-trained model, 0.59-0.79. T2 and volume CCCs for the qDESS-trained model, 0.75-0.98 and 0.47-0.95, were higher than respective CCCs for the OAI-DESS-trained model, 0.35-0.90 and 0.13-0.84. Bland-Altman 95% limits of agreement for superficial and deep cartilage T2 were lower for the qDESS-trained model, ±2.4 msec and ±4.0 msec, than the OAI-DESS-trained model, ±4.4 msec and ±5.2 msec.
The qDESS-trained model may generalize well to independent qDESS datasets regardless of MR scanner, acquisition parameters, and subject population.
1 TECHNICAL EFFICACY: Stage 1.
基于深度学习(DL)的自动分割模型可以加快手动分割的速度,但在部署到新数据集之前需要进行资源密集型的微调。DL 方法在不进行微调的情况下对新数据集的泛化能力尚未得到很好的描述。
通过在具有不同磁共振扫描仪、采集参数和受试者人群的独立数据集中部署预训练模型来评估基于 DL 的模型的泛化能力。
基于前瞻性采集数据的回顾性研究。
总体测试数据集:59 名受试者(26 名女性);研究 1:5 名健康受试者(无女性);研究 2:8 名健康受试者(全部为女性);研究 3:10 名骨关节炎患者(8 名女性);研究 4:36 名患有各种膝关节疾病的受试者(10 名女性)。
磁场强度/序列:3T,定量双回波稳态(qDESS)。
四名注释员手动分割膝关节软骨。每位读者在测试数据集中分割了四个 qDESS 数据集之一。评估了两种 DL 模型,一种基于 qDESS 数据训练,另一种基于骨关节炎倡议(OAI)-DESS 数据训练。通过量化浅层和深层软骨分割准确性、体积和 T2 弛豫时间的变化来比较手动和自动分割。
用于分割准确性的 Dice 相似系数(DSC)。Lin 的一致性相关系数(CCC)、Wilcoxon 秩和检验、均方根误差-变异系数,用于量化手动与自动 T2 和体积变化。用于手动与自动 T2 一致性的 Bland-Altman 图。P 值<0.05 被认为具有统计学意义。
qDESS 训练模型的 DSCs 为 0.79-0.93,高于 OAI-DESS 训练模型的 0.59-0.79。qDESS 训练模型的 T2 和体积 CCC 分别为 0.75-0.98 和 0.47-0.95,高于 OAI-DESS 训练模型的相应 CCC 分别为 0.35-0.90 和 0.13-0.84。对于浅层和深层软骨的 T2,qDESS 训练模型的 Bland-Altman 95%一致性界限分别为±2.4 msec 和±4.0 msec,低于 OAI-DESS 训练模型的±4.4 msec 和±5.2 msec。
qDESS 训练模型可能很好地推广到独立的 qDESS 数据集,而与磁共振扫描仪、采集参数和受试者人群无关。
1 技术功效:阶段 1。