He Zebang, Wong Alex Ngai Nick, Yoo Jung Sun
Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Kowloon, Hong Kong Special Administrative Region of China.
DOBI Medical International Inc., Hangzhou, China.
Comput Biol Med. 2025 Jul 3;196(Pt A):110625. doi: 10.1016/j.compbiomed.2025.110625.
Radiology reports are essential in medical imaging, providing critical insights for diagnosis, treatment, and patient management by bridging the gap between radiologists and referring physicians. However, the manual generation of radiology reports is time-consuming and labor-intensive, leading to inefficiencies and delays in clinical workflows, particularly as case volumes increase. Although deep learning approaches have shown promise in automating radiology report generation, existing methods, particularly those based on the encoder-decoder framework, suffer from significant limitations. These include a lack of explainability due to black-box features generated by encoder and limited adaptability to diverse clinical settings.
In this study, we address these challenges by proposing a novel deep learning framework for radiology report generation that enhances explainability, accuracy, and adaptability. Our approach replaces traditional black-box features in computer vision with transparent keyword lists, improving the interpretability of the feature extraction process. To generate these keyword lists, we apply a multi-label classification technique, which is further enhanced by an automatic keyword adaptation mechanism. This adaptation dynamically configures the multi-label classification to better adapt specific clinical environments, reducing the reliance on manually curated reference keyword lists and improving model adaptability across diverse datasets. We also introduce a frequency-based multi-label classification strategy to address the issue of keyword imbalance, ensuring that rare but clinically significant terms are accurately identified. Finally, we leverage a pre-trained text-to-text large language model (LLM) to generate human-like, clinically relevant radiology reports from the extracted keyword lists, ensuring linguistic quality and clinical coherence.
We evaluate our method using two public datasets, IU-XRay and MIMIC-CXR, demonstrating superior performance over state-of-the-art methods. Our framework not only improves the accuracy and reliability of radiology report generation but also enhances the explainability of the process, fostering greater trust and adoption of AI-driven solutions in clinical practice. Comprehensive ablation studies confirm the robustness and effectiveness of each component, highlighting the significant contributions of our framework to advancing automated radiology reporting.
In conclusion, we developed a novel deep-learning based radiology report generation method for preparing high-quality and explainable radiology report for chest X-ray images using the multi-label classification and a text-to-text large language model. Our method could address the lack of explainability in the current workflow and provide a clear and flexible automated pipeline to reduce the workload of radiologists and support the further applications related to Human-AI interactive communications.
放射学报告在医学成像中至关重要,通过弥合放射科医生和转诊医生之间的差距,为诊断、治疗和患者管理提供关键见解。然而,手动生成放射学报告既耗时又费力,导致临床工作流程效率低下和延误,尤其是随着病例数量的增加。尽管深度学习方法在自动化放射学报告生成方面显示出前景,但现有方法,特别是基于编码器-解码器框架的方法,存在重大局限性。这些局限性包括由于编码器生成的黑箱特征而缺乏可解释性,以及对不同临床环境的适应性有限。
在本研究中,我们通过提出一种用于放射学报告生成的新型深度学习框架来应对这些挑战,该框架提高了可解释性、准确性和适应性。我们的方法用透明的关键词列表取代了计算机视觉中的传统黑箱特征,提高了特征提取过程的可解释性。为了生成这些关键词列表,我们应用了一种多标签分类技术,并通过自动关键词适应机制对其进行进一步增强。这种适应动态配置多标签分类,以更好地适应特定临床环境,减少对人工策划的参考关键词列表的依赖,并提高模型在不同数据集上的适应性。我们还引入了一种基于频率的多标签分类策略来解决关键词不平衡问题,确保准确识别罕见但具有临床意义的术语。最后,我们利用预训练的文本到文本大语言模型(LLM)从提取的关键词列表中生成类人文、临床相关的放射学报告,确保语言质量和临床连贯性。
我们使用两个公共数据集IU-XRay和MIMIC-CXR评估了我们的方法,证明了其优于现有方法的性能。我们的框架不仅提高了放射学报告生成的准确性和可靠性,还增强了该过程的可解释性,促进了临床实践中对人工智能驱动解决方案的更大信任和采用。全面的消融研究证实了每个组件的稳健性和有效性,突出了我们的框架对推进自动化放射学报告的重大贡献。
总之,我们开发了一种基于深度学习的新型放射学报告生成方法,使用多标签分类和文本到文本大语言模型为胸部X光图像准备高质量且可解释的放射学报告。我们的方法可以解决当前工作流程中缺乏可解释性的问题,并提供一个清晰灵活的自动化管道,以减轻放射科医生的工作量,并支持与人类-人工智能交互通信相关的进一步应用。