Department of Diagnostic and Interventional Radiology, Technische Universität München, Munich, 81675, Germany.
Department of Radiology, Humanitas Clinical and Research Hospital, Rozzano, Milan, 20090, Italy.
Med Phys. 2018 Oct;45(10):4439-4447. doi: 10.1002/mp.13151. Epub 2018 Sep 18.
The purpose of this study was the evaluation of anthropomorphic model observers trained with neural networks for the prediction of a human observer's performance.
To simulate liver lesions, a phantom with contrast targets (acrylic spheres, varying diameters, +30 HU) was repeatedly scanned on a computed tomography scanner. Image data labeled with confidence ratings assessed in a reader study for a detection task of liver lesions were used to build several anthropomorphic model observers. Models were trained with images reconstructed with iterative reconstruction and evaluated with images reconstructed with filtered backprojection. A neural network, based on softmax regression (SR-MO), and convolutional neural networks (CNN-MO) were used to predict the performance of a human observer and compared to a channelized Hotelling observer [with Gabor channels and internal channel noise (CHOi)]. Model observers were evaluated by a receiver operating characteristic curve analysis and compared to the results in the reader study. Two strategies were used to train the SR-MO and CNN-MO: A) building a separate model for each lesion size; B) building one model that was applied to lesions of all sizes.
All tested model observers and the human observer were highly correlated at each lesion size and dose level. With strategy A, Pearson's product-moment correlation coefficients r were 0.926 (95% confidence interval (CI): 0.679-0.985) for SR-MO and 0.979 (95% CI: 0.902-0.996) for CNN-MO. With strategy B, r was 0.860 (95% CI: 0.454-0.970) for SR-MO and 0.918 (95% CI: 0.651-0.983) for CNN-MO. For CHOi, r was 0.945 (95% CI: 0.755-0.989). With strategy A, mean absolute percentage differences (MAPD) between the model observers and the human observer were 3.7% for SR-MO and 1.2% for CNN-MO. With strategy B, MAPD were 3.7% for SR-MO and 3.0% for CNN-MO. For the CHOi the MAPD was 2.2%.
Convolutional neural network model observers can accurately predict the performance of a human observer for all lesion sizes and dose levels in the evaluated signal detection task.
本研究的目的是评估使用神经网络训练的拟人模型观察者对人类观察者性能的预测能力。
为了模拟肝脏病变,使用具有对比度目标(丙烯酸球,不同直径,+30 HU)的体模在计算机断层扫描仪上进行重复扫描。使用在读者研究中对肝脏病变检测任务进行评估并标记置信度评分的图像数据来构建多个拟人模型观察者。使用迭代重建重建的图像对模型进行训练,并使用滤波反投影重建的图像进行评估。使用基于软最大值回归(SR-MO)和卷积神经网络(CNN-MO)的神经网络来预测人类观察者的性能,并与通道化霍特林观察者[具有伽马通道和内部通道噪声(CHOi)]进行比较。通过受试者工作特征曲线分析对模型观察者进行评估,并将结果与读者研究进行比较。使用两种策略来训练 SR-MO 和 CNN-MO:A)为每个病变大小构建一个单独的模型;B)构建一个适用于所有病变大小的模型。
在每个病变大小和剂量水平下,所有测试的模型观察者和人类观察者都高度相关。使用策略 A,SR-MO 的皮尔逊乘积矩相关系数 r 为 0.926(95%置信区间(CI):0.679-0.985),CNN-MO 的 r 为 0.979(95%CI:0.902-0.996)。使用策略 B,SR-MO 的 r 为 0.860(95%CI:0.454-0.970),CNN-MO 的 r 为 0.918(95%CI:0.651-0.983)。对于 CHOi,r 为 0.945(95%CI:0.755-0.989)。使用策略 A,SR-MO 和 CNN-MO 与人类观察者之间的平均绝对百分比差异(MAPD)分别为 3.7%和 1.2%。使用策略 B,SR-MO 和 CNN-MO 的 MAPD 分别为 3.7%和 3.0%。对于 CHOi,MAPD 为 2.2%。
卷积神经网络模型观察者可以准确预测评估信号检测任务中所有病变大小和剂量水平下人类观察者的性能。