使用微调后的大语言模型对脑部磁共振成像报告进行自动分类

Automated classification of brain MRI reports using fine-tuned large language models.

作者信息

Kanzawa Jun, Yasaka Koichiro, Fujita Nana, Fujiwara Shin, Abe Osamu

机构信息

Department of Radiology, The University of Tokyo Hospital, Tokyo, Japan.

出版信息

Neuroradiology. 2024 Dec;66(12):2177-2183. doi: 10.1007/s00234-024-03427-7. Epub 2024 Jul 12.

DOI:10.1007/s00234-024-03427-7

PMID:38995393

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11611921/

Abstract

PURPOSE

This study aimed to investigate the efficacy of fine-tuned large language models (LLM) in classifying brain MRI reports into pretreatment, posttreatment, and nontumor cases.

METHODS

This retrospective study included 759, 284, and 164 brain MRI reports for training, validation, and test dataset. Radiologists stratified the reports into three groups: nontumor (group 1), posttreatment tumor (group 2), and pretreatment tumor (group 3) cases. A pretrained Bidirectional Encoder Representations from Transformers Japanese model was fine-tuned using the training dataset and evaluated on the validation dataset. The model which demonstrated the highest accuracy on the validation dataset was selected as the final model. Two additional radiologists were involved in classifying reports in the test datasets for the three groups. The model's performance on test dataset was compared to that of two radiologists.

RESULTS

The fine-tuned LLM attained an overall accuracy of 0.970 (95% CI: 0.930-0.990). The model's sensitivity for group 1/2/3 was 1.000/0.864/0.978. The model's specificity for group1/2/3 was 0.991/0.993/0.958. No statistically significant differences were found in terms of accuracy, sensitivity, and specificity between the LLM and human readers (p ≥ 0.371). The LLM completed the classification task approximately 20-26-fold faster than the radiologists. The area under the receiver operating characteristic curve for discriminating groups 2 and 3 from group 1 was 0.994 (95% CI: 0.982-1.000) and for discriminating group 3 from groups 1 and 2 was 0.992 (95% CI: 0.982-1.000).

CONCLUSION

Fine-tuned LLM demonstrated a comparable performance with radiologists in classifying brain MRI reports, while requiring substantially less time.

摘要

目的

本研究旨在探讨微调后的大语言模型（LLM）在将脑部MRI报告分类为治疗前、治疗后和非肿瘤病例方面的疗效。

方法

这项回顾性研究包括759份、284份和164份脑部MRI报告，分别用于训练、验证和测试数据集。放射科医生将报告分为三组：非肿瘤（第1组）、治疗后肿瘤（第2组）和治疗前肿瘤（第3组）病例。使用训练数据集对预训练的来自Transformer的日语双向编码器表示模型进行微调，并在验证数据集上进行评估。选择在验证数据集上表现出最高准确率的模型作为最终模型。另外两名放射科医生参与对三组测试数据集中的报告进行分类。将模型在测试数据集上的表现与两名放射科医生的表现进行比较。

结果

微调后的LLM总体准确率达到0.970（95%置信区间：0.930 - 0.990）。该模型对第1/2/3组的敏感性分别为1.000/0.864/0.978。该模型对第1/2/3组的特异性分别为0.991/0.993/0.958。在准确率、敏感性和特异性方面，LLM与人类读者之间未发现统计学上的显著差异（p≥0.371）。LLM完成分类任务比放射科医生快约20 - 26倍。用于区分第2组和第3组与第1组的受试者操作特征曲线下面积为0.994（95%置信区间：0.982 - 1.000），用于区分第3组与第1组和第2组的面积为0.992（95%置信区间：0.982 - 1.000）。

结论

微调后的LLM在对脑部MRI报告进行分类时表现出与放射科医生相当的性能，同时所需时间大大减少。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用微调后的大语言模型对脑部磁共振成像报告进行自动分类

Automated classification of brain MRI reports using fine-tuned large language models.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

使用微调后的大语言模型对脑部磁共振成像报告进行自动分类

Automated classification of brain MRI reports using fine-tuned large language models.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献