Das Avisha, Talati Ish A, Chaves Juan Manuel Zambrano, Rubin Daniel, Banerjee Imon
Arizona Advanced AI & Innovation (A3I) Hub, Mayo Clinic Arizona, Phoenix, AZ, USA.
Department of Radiology, Stanford University, Stanford, CA, USA.
NPJ Digit Med. 2025 May 8;8(1):257. doi: 10.1038/s41746-025-01522-4.
Critical findings in radiology reports are life threatening conditions that need to be communicated promptly to physicians for timely management of patients. Although challenging, advancements in natural language processing (NLP), particularly large language models (LLMs), now enable the automated identification of key findings from verbose reports. Given the scarcity of labeled critical findings data, we implemented a two-phase, weakly supervised fine-tuning approach on 15,000 unlabeled Mayo Clinic reports. This fine-tuned model then automatically extracted critical terms on internal (Mayo Clinic, n = 80) and external (MIMIC-III, n = 123) test datasets, validated against expert annotations. Model performance was further assessed on 5000 MIMIC-IV reports using LLM-aided metrics, G-eval and Prometheus. Both manual and LLM-based evaluations showed improved task alignment with weak supervision. The pipeline and model, publicly available under an academic license, can aid in critical finding extraction for research and clinical use ( https://github.com/dasavisha/CriticalFindings_Extract ).
放射学报告中的关键发现是危及生命的情况,需要及时告知医生以便对患者进行及时治疗。尽管具有挑战性,但自然语言处理(NLP)的进展,特别是大语言模型(LLMs),现在能够从冗长的报告中自动识别关键发现。鉴于标记的关键发现数据稀缺,我们对15000份未标记的梅奥诊所报告实施了两阶段的弱监督微调方法。然后,这个经过微调的模型在内部(梅奥诊所,n = 80)和外部(MIMIC-III,n = 123)测试数据集上自动提取关键术语,并根据专家注释进行验证。使用大语言模型辅助指标G-eval和Prometheus在5000份MIMIC-IV报告上进一步评估模型性能。基于人工和大语言模型的评估均显示,在弱监督下任务对齐得到了改善。该流程和模型在学术许可下公开可用,可有助于提取关键发现以供研究和临床使用(https://github.com/dasavisha/CriticalFindings_Extract)。