Department of Health Information Management and Medical Informatics, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.
Department of Health Information Management and Medical Informatics, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.
Acad Radiol. 2024 Dec;31(12):4823-4832. doi: 10.1016/j.acra.2024.07.020. Epub 2024 Aug 13.
The process of generating radiology reports is often time-consuming and labor-intensive, prone to incompleteness, heterogeneity, and errors. By employing natural language processing (NLP)-based techniques, this study explores the potential for enhancing the efficiency of radiology report generation through the remarkable capabilities of ChatGPT (Generative Pre-training Transformer), a prominent large language model (LLM).
Using a sample of 1000 records from the Medical Information Mart for Intensive Care (MIMIC) Chest X-ray Database, this investigation employed Claude.ai to extract initial radiological report keywords. ChatGPT then generated radiology reports using a consistent 3-step prompt template outline. Various lexical and sentence similarity techniques were employed to evaluate the correspondence between the AI assistant-generated reports and reference reports authored by medical professionals.
Results showed varying performance among NLP models, with Bart (Bidirectional and Auto-Regressive Transformers) and XLM (Cross-lingual Language Model) displaying high proficiency (mean similarity scores up to 99.3%), closely mirroring physician reports. Conversely, DeBERTa (Decoding-enhanced BERT with disentangled attention) and sequence-matching models scored lower, indicating less alignment with medical language. In the Impression section, the Word-Embedding model excelled with a mean similarity of 84.4%, while others like the Jaccard index showed lower performance.
Overall, the study highlights significant variations across NLP models in their ability to generate radiology reports consistent with medical professionals' language. Pairwise comparisons and Kruskal-Wallis tests confirmed these differences, emphasizing the need for careful selection and evaluation of NLP models in radiology report generation. This research underscores the potential of ChatGPT to streamline and improve the radiology reporting process, with implications for enhancing efficiency and accuracy in clinical practice.
放射科报告的生成过程通常既耗时又费力,容易出现不完整、异质和错误。本研究通过使用基于自然语言处理(NLP)的技术,探索利用 ChatGPT(生成式预训练转换器)这一大型语言模型(LLM)的卓越能力来提高放射科报告生成效率的潜力。
利用来自医疗信息监护医学信息库(MIMIC)胸部 X 光数据库的 1000 个记录样本,本研究使用 Claude.ai 提取初始放射学报告关键词。然后,ChatGPT 使用一致的 3 步提示模板大纲生成放射科报告。采用各种词汇和句子相似度技术来评估 AI 助手生成的报告与由医学专业人员撰写的参考报告之间的对应关系。
结果表明,NLP 模型的性能存在差异,Bart(双向和自回归转换器)和 XLM(跨语言语言模型)表现出较高的熟练度(平均相似度得分高达 99.3%),与医生报告非常相似。相比之下,DeBERTa(带分离注意力的解码增强 BERT)和序列匹配模型得分较低,表明与医学语言的一致性较低。在印象部分,Word-Embedding 模型表现出色,平均相似度为 84.4%,而其他模型如 Jaccard 指数表现较差。
总体而言,该研究强调了 NLP 模型在生成与医学专业人员语言一致的放射科报告方面的能力存在显著差异。成对比较和 Kruskal-Wallis 检验证实了这些差异,强调了在放射科报告生成中需要仔细选择和评估 NLP 模型。这项研究突显了 ChatGPT 简化和改善放射科报告流程的潜力,对提高临床实践中的效率和准确性具有重要意义。