Suppr超能文献

GPT-4先进数据分析在基于影像组学的机器学习模型中的潜力。

The potential of GPT-4 advanced data analysis for radiomics-based machine learning models.

作者信息

Foltyn-Dumitru Martha, Rastogi Aditya, Cho Jaeyoung, Schell Marianne, Mahmutoglu Mustafa Ahmed, Kessler Tobias, Sahm Felix, Wick Wolfgang, Bendszus Martin, Brugnara Gianluca, Vollmuth Philipp

机构信息

Division for Computational Radiology & Clinical AI (CCIBonn.ai), Department of Neuroradiology, Bonn University Hospital, Bonn, Germany.

Division for Computational Neuroimaging, Heidelberg University Hospital, Heidelberg, Germany.

出版信息

Neurooncol Adv. 2024 Dec 23;7(1):vdae230. doi: 10.1093/noajnl/vdae230. eCollection 2025 Jan-Dec.

Abstract

BACKGROUND

This study aimed to explore the potential of the Advanced Data Analytics (ADA) package of GPT-4 to autonomously develop machine learning models (MLMs) for predicting glioma molecular types using radiomics from MRI.

METHODS

Radiomic features were extracted from preoperative MRI of  = 615 newly diagnosed glioma patients to predict glioma molecular types (IDH-wildtype vs IDH-mutant 1p19q-codeleted vs IDH-mutant 1p19q-non-codeleted) with a multiclass ML approach. Specifically, ADA was used to autonomously develop an ML pipeline and benchmark performance against an established handcrafted model using various MRI normalization methods (N4, Zscore, and WhiteStripe). External validation was performed on 2 public glioma datasets D2 ( = 160) and D3 ( = 410).

RESULTS

GPT-4 achieved the highest accuracy of 0.820 (95% CI = 0.819-0.821) on the D3 dataset with N4/WS normalization, significantly outperforming the benchmark model's accuracy of 0.678 (95% CI = 0.677-0.680) ( < .001). Class-wise analysis showed performance variations across different glioma types. In the IDH-wildtype group, GPT-4 had a recall of 0.997 (95% CI = 0.997-0.997), surpassing the benchmark's 0.742 (95% CI = 0.740-0.743). For the IDH-mut 1p/19q-non-codel group, GPT-4's recall was 0.275 (95% CI = 0.272-0.279), lower than the benchmark's 0.426 (95% CI = 0.423-0.430). In the IDH-mut 1p/19q-codel group, GPT-4's recall was 0.199 (95% CI = 0.191-0.206), below the benchmark's 0.730 (95% CI = 0.721-0.738). On the D2 dataset, GPT-4's accuracy was significantly lower ( < .001) than the benchmark's, with N4/WS achieving 0.668 (95% CI = 0.666-0.671) compared with 0.719 (95% CI = 0.717-0.722) ( < .001). Class-wise analysis revealed the same pattern as observed in D3.

CONCLUSIONS

GPT-4 can autonomously develop radiomics-based MLMs, achieving performance comparable to handcrafted MLMs. However, its poorer class-wise performance due to unbalanced datasets shows limitations in handling complete end-to-end ML pipelines.

摘要

背景

本研究旨在探索GPT-4的高级数据分析(ADA)软件包利用磁共振成像(MRI)的放射组学自主开发用于预测胶质瘤分子类型的机器学习模型(MLM)的潜力。

方法

从615例新诊断的胶质瘤患者的术前MRI中提取放射组学特征,采用多类ML方法预测胶质瘤分子类型(异柠檬酸脱氢酶野生型与异柠檬酸脱氢酶突变型1p19q共缺失与异柠檬酸脱氢酶突变型1p19q非共缺失)。具体而言,ADA用于自主开发一个ML流程,并使用各种MRI归一化方法(N4、Z分数和WhiteStripe)与一个既定的手工模型进行性能基准测试。在2个公开的胶质瘤数据集D2(n = 160)和D3(n = 410)上进行外部验证。

结果

在D3数据集上,使用N4/WS归一化时,GPT-4达到了最高准确率0.820(95%置信区间 = 0.819 - 0.821),显著优于基准模型的准确率0.678(95%置信区间 = 0.677 - 0.680)(P <.001)。类别分析显示不同胶质瘤类型的性能存在差异。在异柠檬酸脱氢酶野生型组中,GPT-4的召回率为0.997(95%置信区间 = 0.997 - 0.997),超过基准的0.742(95%置信区间 = 0.740 - 0.743)。对于异柠檬酸脱氢酶突变1p/19q非共缺失组,GPT-4的召回率为0.275(95%置信区间 = 0.272 - 0.279),低于基准的0.426(95%置信区间 = 0.423 - 0.430)。在异柠檬酸脱氢酶突变1p/19q共缺失组中,GPT-4的召回率为0.199(95%置信区间 = 0.191 - 0.206),低于基准的0.730(95%置信区间 = 0.721 - 0.738)。在D2数据集上,GPT-4的准确率显著低于基准(P <.001),N4/WS方法达到0.668(95%置信区间 = 0.666 - 0.671),而基准为0.719(95%置信区间 = 0.717 - 0.722)(P <.001)。类别分析揭示了与D3中观察到的相同模式。

结论

GPT-4可以自主开发基于放射组学的MLM,性能与手工MLM相当。然而,由于数据集不平衡导致其类别性能较差,这表明在处理完整的端到端ML流程方面存在局限性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9bc8/11707530/51fbcc6d4693/vdae230_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验