Suppr超能文献

基于美国医学物理学家协会 263 工作组报告,对基础大语言模型进行结构名称重标记能力的基准测试。

Benchmarking a Foundation Large Language Model on its Ability to Relabel Structure Names in Accordance With the American Association of Physicists in Medicine Task Group-263 Report.

机构信息

Department of Radiation Oncology, Mayo Clinic, Phoenix, Arizona.

Department of Radiation Oncology, Mayo Clinic, Phoenix, Arizona.

出版信息

Pract Radiat Oncol. 2024 Nov-Dec;14(6):e515-e521. doi: 10.1016/j.prro.2024.04.017. Epub 2024 Sep 5.

Abstract

PURPOSE

To introduce the concept of using large language models (LLMs) to relabel structure names in accordance with the American Association of Physicists in Medicine Task Group-263 standard and to establish a benchmark for future studies to reference.

METHODS AND MATERIALS

Generative Pretrained Transformer (GPT)-4 was implemented within a Digital Imaging and Communications in Medicine server. Upon receiving a structure-set Digital Imaging and Communications in Medicine file, the server prompts GPT-4 to relabel the structure names according to the American Association of Physicists in Medicine Task Group-263 report. The results were evaluated for 3 disease sites: prostate, head and neck, and thorax. For each disease site, 150 patients were randomly selected for manually tuning the instructions prompt (in batches of 50), and 50 patients were randomly selected for evaluation. Structure names considered were those that were most likely to be relevant for studies using structure contours for many patients.

RESULTS

The per-patient accuracy was 97.2%, 98.3%, and 97.1% for prostate, head and neck, and thorax disease sites, respectively. On a per-structure basis, the clinical target volume was relabeled correctly in 100%, 95.3%, and 92.9% of cases, respectively.

CONCLUSIONS

Given the accuracy of GPT-4 in relabeling structure names as presented in this work, LLMs are poised to become an important method for standardizing structure names in radiation oncology, especially considering the rapid advancements in LLM capabilities that are likely to continue.

摘要

目的

介绍使用大型语言模型(LLM)根据美国医学物理学家协会任务组 263 标准重新标记结构名称的概念,并为未来的研究建立基准。

方法和材料

在数字成像和通信医学服务器中实现了生成式预训练转换器(GPT)-4。在收到结构集数字成像和通信医学文件后,服务器提示 GPT-4 根据美国医学物理学家协会任务组 263 报告重新标记结构名称。评估了 3 个疾病部位:前列腺、头颈部和胸部。对于每个疾病部位,随机选择 150 名患者进行手动调整指令提示(每次 50 名),并随机选择 50 名患者进行评估。考虑的结构名称是那些最有可能与使用结构轮廓进行许多患者研究相关的名称。

结果

在前列腺、头颈部和胸部疾病部位,每位患者的准确率分别为 97.2%、98.3%和 97.1%。在每个结构的基础上,临床靶区的正确标记率分别为 100%、95.3%和 92.9%。

结论

鉴于 GPT-4 在重新标记结构名称方面的准确性,大型语言模型有望成为放射肿瘤学中标准化结构名称的重要方法,特别是考虑到大型语言模型功能的快速进步,这种方法很可能会继续。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验