Sato Mai, Yasaka Koichiro, Abe Shimon, Kurashima Joji, Asari Yusuke, Kiryu Shigeru, Abe Osamu
The University of Tokyo, Tokyo, Japan.
International University of Health and Welfare, Ōtawara, Japan.
Abdom Radiol (NY). 2025 Jun 11. doi: 10.1007/s00261-025-05062-z.
Appropriate categorization based on magnetic resonance imaging (MRI) findings is important for managing intraductal papillary mucinous neoplasms (IPMNs). In this study, a large language model (LLM) that classifies IPMNs based on MRI findings was developed, and its performance was compared with that of less experienced human readers.
The medical image management and processing systems of our hospital were searched to identify MRI reports of branch-duct IPMNs (BD-IPMNs). They were assigned to the training, validation, and testing datasets in chronological order. The model was trained on the training dataset, and the best-performing model on the validation dataset was evaluated on the test dataset. Furthermore, two radiology residents (Readers 1 and 2) and an intern (Reader 3) manually sorted the reports in the test dataset. The accuracy, sensitivity, and time required for categorizing were compared between the model and readers.
The accuracy of the fine-tuned LLM for the test dataset was 0.966, which was comparable to that of Readers 1 and 2 (0.931-0.972) and significantly better than that of Reader 3 (0.907). The fine-tuned LLM had an area under the receiver operating characteristic curve of 0.982 for the classification of cyst diameter ≥ 10 mm, which was significantly superior to that of Reader 3 (0.944). Furthermore, the fine-tuned LLM (25 s) completed the test dataset faster than the readers (1,887-2,646 s).
The fine-tuned LLM classified BD-IPMNs based on MRI findings with comparable performance to that of radiology residents and significantly reduced the time required.
基于磁共振成像(MRI)结果进行恰当分类对于导管内乳头状黏液性肿瘤(IPMN)的管理很重要。在本研究中,开发了一种基于MRI结果对IPMN进行分类的大语言模型(LLM),并将其性能与经验较少的人类读者的性能进行比较。
检索我院的医学图像管理和处理系统,以识别分支导管IPMN(BD-IPMN)的MRI报告。它们按时间顺序被分配到训练、验证和测试数据集。该模型在训练数据集上进行训练,并在测试数据集上评估验证数据集中表现最佳的模型。此外,两名放射科住院医师(读者1和读者2)和一名实习生(读者3)对测试数据集中的报告进行人工分类。比较了模型和读者在分类准确性、敏感性和所需时间方面的差异。
针对测试数据集,微调后的LLM的准确率为0.966,与读者1和读者2的准确率(0.931 - 0.972)相当,且显著优于读者3的准确率(0.907)。对于囊肿直径≥10 mm的分类,微调后的LLM的受试者操作特征曲线下面积为0.982,显著优于读者3的(0.944)。此外,微调后的LLM(25秒)完成测试数据集的速度比读者(1887 - 2646秒)快。
微调后的LLM基于MRI结果对BD-IPMN进行分类,其性能与放射科住院医师相当,并显著减少了所需时间。