Sugimoto Kento, Wada Shoya, Konishi Shozo, Sato Junya, Okada Katsuki, Kido Shoji, Tomiyama Noriyuki, Matsumura Yasushi, Takeda Toshihiro
Department of Medical Informatics, Osaka University Graduate School of Medicine, 2-2 Yamadaoka, Suita, 565-0871, Osaka, Japan.
Department of Transformative System for Medical Information, Osaka University Graduate School of Medicine, 2-2, Yamadaoka, Suita, 565-0871, Osaka, Japan.
J Imaging Inform Med. 2025 Jan 22. doi: 10.1007/s10278-024-01338-w.
Missed critical imaging findings, particularly those indicating cancer, are a common issue that can result in delays in patient follow-up and treatment. To address this, we developed a rule-based natural language processing (NLP) algorithm to detect cancer-suspicious findings from Japanese radiology reports. The dataset used consisted of chest and abdomen CT reports from six institutions. Reports from our institution were used for algorithm development and internal evaluation, while reports from the other five institutions were used for external evaluation. To create the gold standard, reports were annotated by two experienced physicians. Data were statistically analyzed using precision, recall and F1 score with 1000 bootstrap iterations. BERT was used as a baseline deep learning model, and its performance was compared with the proposed rule-based method. At the report level of detection, the overall precision, recall, and F-1 score were 0.886, 0.886, and 0.883, respectively, for the rule-based algorithm, which were higher than those of the deep learning algorithm (0.851, 0.679, and 0.733). The overall results include both internal and external validation data. For the internal validation set, the precision, recall, and F-1 score were 0.929, 0.929, and 0.927, respectively. For the external validation set, the precision, recall, and F-1 score were 0.875, 0.879, and 0.873, demonstrating generalizability. In conclusion, we show the rule-based NLP algorithm exhibited a high performance in detecting cancer-suspicious findings from multi-institutional CT reports.
遗漏关键影像检查结果,尤其是那些提示癌症的结果,是一个常见问题,可能导致患者随访和治疗延迟。为解决这一问题,我们开发了一种基于规则的自然语言处理(NLP)算法,用于从日本放射学报告中检测可疑癌症的结果。所使用的数据集包括来自六个机构的胸部和腹部CT报告。我们机构的报告用于算法开发和内部评估,而其他五个机构的报告用于外部评估。为创建金标准,报告由两名经验丰富的医生进行注释。使用精度、召回率和F1分数进行统计分析,并进行1000次自助抽样迭代。BERT被用作基线深度学习模型,并将其性能与所提出的基于规则的方法进行比较。在报告检测层面,基于规则的算法的总体精度、召回率和F-1分数分别为0.886、0.886和0.883,高于深度学习算法(0.851、0.679和0.733)。总体结果包括内部和外部验证数据。对于内部验证集,精度、召回率和F-1分数分别为0.929、0.929和0.927。对于外部验证集,精度、召回率和F-1分数分别为0.875、0.879和0.873,表明具有可推广性。总之,我们表明基于规则的NLP算法在从多机构CT报告中检测可疑癌症结果方面表现出高性能。