Department of Surgery, Keio University School of Medicine, Tokyo, Japan.
Fixstars Corporation, Tokyo, Japan.
Cancer Sci. 2022 Oct;113(10):3528-3534. doi: 10.1111/cas.15511. Epub 2022 Aug 3.
Although the categorization of ultrasound using the Breast Imaging Reporting and Data System (BI-RADS) has become widespread worldwide, the problem of inter-observer variability remains. To maintain uniformity in diagnostic accuracy, we have developed a system in which artificial intelligence (AI) can distinguish whether a static image obtained using a breast ultrasound represents BI-RADS3 or lower or BI-RADS4a or higher to determine the medical management that should be performed on a patient whose breast ultrasound shows abnormalities. To establish and validate the AI system, a training dataset consisting of 4028 images containing 5014 lesions and a test dataset consisting of 3166 images containing 3656 lesions were collected and annotated. We selected a setting that maximized the area under the curve (AUC) and minimized the difference in sensitivity and specificity by adjusting the internal parameters of the AI system, achieving an AUC, sensitivity, and specificity of 0.95, 91.2%, and 90.7%, respectively. Furthermore, based on 30 images extracted from the test data, the diagnostic accuracy of 20 clinicians and the AI system was compared, and the AI system was found to be significantly superior to the clinicians (McNemar test, p < 0.001). Although deep-learning methods to categorize benign and malignant tumors using breast ultrasound have been extensively reported, our work represents the first attempt to establish an AI system to classify BI-RADS3 or lower and BI-RADS4a or higher successfully, providing important implications for clinical actions. These results suggest that the AI diagnostic system is sufficient to proceed to the next stage of clinical application.
虽然使用乳腺影像报告和数据系统 (BI-RADS) 对超声进行分类已在全球范围内广泛应用,但观察者间的变异性问题仍然存在。为了保持诊断准确性的一致性,我们开发了一种系统,其中人工智能 (AI) 可以区分使用乳腺超声获得的静态图像是 BI-RADS3 或更低还是 BI-RADS4a 或更高,以确定对乳腺超声显示异常的患者应进行何种医学管理。为了建立和验证 AI 系统,我们收集并注释了一个包含 4028 张图像和 5014 个病灶的训练数据集,以及一个包含 3166 张图像和 3656 个病灶的测试数据集。我们通过调整 AI 系统的内部参数来选择最大化曲线下面积 (AUC) 并最小化敏感性和特异性差异的设置,分别实现了 0.95、91.2%和 90.7%的 AUC、敏感性和特异性。此外,基于从测试数据中提取的 30 张图像,比较了 20 位临床医生和 AI 系统的诊断准确性,发现 AI 系统明显优于临床医生(McNemar 检验,p<0.001)。虽然已经广泛报道了使用乳腺超声对良性和恶性肿瘤进行分类的深度学习方法,但我们的工作是首次尝试成功建立 AI 系统来分类 BI-RADS3 或更低和 BI-RADS4a 或更高,为临床操作提供了重要启示。这些结果表明,AI 诊断系统足以进入下一阶段的临床应用。