Ghosh Adarsh, Li Hailong, Trout Andrew T
Department of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio (A.G., H.L., A.T.T.); Department of Radiology, Nationwide Children's Hospital, Columbus, Ohio (A.G.).
Department of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio (A.G., H.L., A.T.T.); Department of Radiology, University of Cincinnati College of Medicine, Cincinnati, Ohio (H.L., A.T.T.); Imaging Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio (H.L.).
Acad Radiol. 2025 Feb;32(2):604-611. doi: 10.1016/j.acra.2024.09.042. Epub 2024 Oct 15.
Original research in radiology often involves handling large datasets, data manipulation, statistical tests, and coding. Recent studies show that large language models (LLMs) can solve bioinformatics tasks, suggesting their potential in radiology research. This study evaluates an LLM's ability to provide statistical and deep learning solutions and code for radiology research.
We used web-based chat interfaces available for ChatGPT-4o, ChatGPT-3.5, and Google Gemini. EXPERIMENT 1: BIOSTATISTICS AND DATA VISUALIZATION: We assessed each LLMs' ability to suggest biostatistical tests and generate R code for the same using a Cancer Imaging Archive dataset. Prompts were based on statistical analyses from a peer-reviewed manuscript. The generated code was tested in R Studio for correctness, runtime errors and the ability to generate the requested visualization. EXPERIMENT 2: DEEP LEARNING: We used the RSNA-STR Pneumonia Detection Challenge dataset to evaluate ChatGPT-4o and Gemini's ability to generate Python code for transformer-based image classification models (Vision Transformer ViT-B/16). The generated code was tested in a Jupiter Notebook for functionality and run time errors.
Out of the 8 statistical questions posed, correct statistical answers were suggested for 7 (ChatGPT-4o), 6 (ChatGPT-3.5), and 5 (Gemini) scenarios. The R code output by ChatGPT-4o had fewer runtime errors (6 out of the 7 total codes provided) compared to ChatGPT-3.5 (5/7) and Gemini (5/7). Both ChatGPT4o and Gemini were able to generate visualization requested with a few run time errors. Iteratively copying runtime errors from the code generated by ChatGPT4o into the chat helped resolve them. Gemini initially hallucinated during code generation but was able to provide accurate code on restarting the experiment. ChatGPT4-o and Gemini successfully generated initial Python code for deep learning tasks. Errors encountered during implementation were resolved through iterations using the chat interface, demonstrating LLM utility in providing baseline code for further code refinement and resolving run time errors.
LLMs can assist in coding tasks for radiology research, providing initial code for data visualization, statistical tests, and deep learning models helping researchers with foundational biostatistical knowledge. While LLM can offer a useful starting point, they require users to refine and validate the code and caution is necessary due to potential errors, the risk of hallucinations and data privacy regulations.
LLMs can help with coding and statistical problems in radiology research. This can help primary authors trouble shoot coding needed in radiology research.
放射学领域的原创研究通常涉及处理大型数据集、数据操作、统计测试和编码。最近的研究表明,大语言模型(LLMs)可以解决生物信息学任务,这暗示了它们在放射学研究中的潜力。本研究评估了一个大语言模型为放射学研究提供统计和深度学习解决方案及代码的能力。
我们使用了可用于ChatGPT-4o、ChatGPT-3.5和谷歌Gemini的基于网络的聊天界面。实验1:生物统计学与数据可视化:我们使用癌症影像存档数据集评估了每个大语言模型建议生物统计学测试并生成R代码的能力。提示基于一篇同行评议手稿中的统计分析。在R Studio中测试生成的代码的正确性、运行时错误以及生成所需可视化的能力。实验2:深度学习:我们使用RSNA-STR肺炎检测挑战赛数据集评估ChatGPT-4o和Gemini生成基于Transformer的图像分类模型(视觉Transformer ViT-B/16)的Python代码的能力。在Jupiter Notebook中测试生成的代码的功能和运行时错误。
在提出的8个统计问题中,ChatGPT-4o针对7个、ChatGPT-3.5针对6个、Gemini针对5个场景给出了正确的统计答案。与ChatGPT-3.5(7个代码中有5个)和Gemini(7个代码中有5个)相比,ChatGPT-4o输出的R代码运行时错误更少(7个代码中有6个)。ChatGPT4o和Gemini都能够生成所需的可视化,仅有一些运行时错误。将ChatGPT4o生成的代码中的运行时错误迭代复制到聊天中有助于解决这些问题。Gemini在代码生成过程中最初出现了幻觉,但在重新启动实验后能够提供准确的代码。ChatGPT4-o和Gemini成功地为深度学习任务生成了初始Python代码。通过使用聊天界面进行迭代解决了实现过程中遇到的错误,这证明了大语言模型在提供基线代码以进行进一步代码优化和解决运行时错误方面的实用性。
大语言模型可以协助放射学研究的编码任务,为数据可视化、统计测试和深度学习模型提供初始代码,帮助有基础生物统计学知识的研究人员。虽然大语言模型可以提供一个有用的起点,但它们要求用户对代码进行优化和验证,并且由于潜在的错误、幻觉风险和数据隐私法规,必须谨慎使用。
大语言模型可以帮助解决放射学研究中的编码和统计问题。这可以帮助第一作者解决放射学研究中所需的编码问题。