Kang Dong-Wan, Park Gi-Hun, Ryu Wi-Sun, Schellingerhout Dawid, Kim Museong, Kim Yong Soo, Park Chan-Young, Lee Keon-Joo, Han Moon-Ku, Jeong Han-Gil, Kim Dong-Eog
Department of Public Health, Seoul National University Bundang Hospital, Seongnam, Republic of Korea.
Department of Neurology, Gyeonggi Provincial Medical Center, Icheon Hospital, Icheon, Republic of Korea.
Front Neurol. 2023 Dec 29;14:1321964. doi: 10.3389/fneur.2023.1321964. eCollection 2023.
Multiple attempts at intracranial hemorrhage (ICH) detection using deep-learning techniques have been plagued by clinical failures. We aimed to compare the performance of a deep-learning algorithm for ICH detection trained on strongly and weakly annotated datasets, and to assess whether a weighted ensemble model that integrates separate models trained using datasets with different ICH improves performance.
We used brain CT scans from the Radiological Society of North America (27,861 CT scans, 3,528 ICHs) and AI-Hub (53,045 CT scans, 7,013 ICHs) for training. DenseNet121, InceptionResNetV2, MobileNetV2, and VGG19 were trained on strongly and weakly annotated datasets and compared using independent external test datasets. We then developed a weighted ensemble model combining separate models trained on all ICH, subdural hemorrhage (SDH), subarachnoid hemorrhage (SAH), and small-lesion ICH cases. The final weighted ensemble model was compared to four well-known deep-learning models. After external testing, six neurologists reviewed 91 ICH cases difficult for AI and humans.
InceptionResNetV2, MobileNetV2, and VGG19 models outperformed when trained on strongly annotated datasets. A weighted ensemble model combining models trained on SDH, SAH, and small-lesion ICH had a higher AUC, compared with a model trained on all ICH cases only. This model outperformed four deep-learning models (AUC [95% C.I.]: Ensemble model, 0.953[0.938-0.965]; InceptionResNetV2, 0.852[0.828-0.873]; DenseNet121, 0.875[0.852-0.895]; VGG19, 0.796[0.770-0.821]; MobileNetV2, 0.650[0.620-0.680]; < 0.0001). In addition, the case review showed that a better understanding and management of difficult cases may facilitate clinical use of ICH detection algorithms.
We propose a weighted ensemble model for ICH detection, trained on large-scale, strongly annotated CT scans, as no model can capture all aspects of complex tasks.
利用深度学习技术进行颅内出血(ICH)检测的多次尝试都遭遇了临床失败。我们旨在比较在强注释和弱注释数据集上训练的用于ICH检测的深度学习算法的性能,并评估整合使用不同ICH数据集训练的单独模型的加权集成模型是否能提高性能。
我们使用了北美放射学会的脑部CT扫描数据(27861次CT扫描,3528例ICH)和AI-Hub的数据(53045次CT扫描,7013例ICH)进行训练。DenseNet121、InceptionResNetV2、MobileNetV2和VGG19在强注释和弱注释数据集上进行训练,并使用独立的外部测试数据集进行比较。然后,我们开发了一个加权集成模型,该模型结合了在所有ICH、硬膜下出血(SDH)、蛛网膜下腔出血(SAH)和小病灶ICH病例上训练的单独模型。将最终的加权集成模型与四个著名的深度学习模型进行比较。外部测试后,六位神经科医生对91例对AI和人类来说都很困难的ICH病例进行了评估。
在强注释数据集上训练时,InceptionResNetV2、MobileNetV2和VGG19模型表现更优。与仅在所有ICH病例上训练的模型相比,结合在SDH、SAH和小病灶ICH上训练的模型的加权集成模型具有更高的AUC。该模型优于四个深度学习模型(AUC [95%置信区间]:集成模型,0.953[0.938 - 0.965];InceptionResNetV2,0.852[0.828 - 0.873];DenseNet121,0.875[0.852 - 0.895];VGG19,0.796[0.770 - 0.821];MobileNetV2,0.650[0.620 - 0.680];P < 0.0001)。此外,病例评估表明,更好地理解和管理困难病例可能有助于ICH检测算法的临床应用。
我们提出了一种用于ICH检测的加权集成模型,该模型在大规模、强注释的CT扫描上进行训练,因为没有一个模型能够捕捉复杂任务的所有方面。