Suppr超能文献

致癌基因驱动的非小细胞肺癌中生成式预训练Transformer模型的比较分析:引入生成式人工智能性能评分

Comparative Analysis of Generative Pre-Trained Transformer Models in Oncogene-Driven Non-Small Cell Lung Cancer: Introducing the Generative Artificial Intelligence Performance Score.

作者信息

Hamilton Zacharie, Aseem Aseem, Chen Zhengjia, Naffakh Noor, Reizine Natalie M, Weinberg Frank, Jain Shikha, Kessler Larry G, Gadi Vijayakrishna K, Bun Christopher, Nguyen Ryan H

机构信息

University of Illinois Chicago, Chicago, IL.

University of Washington, Seattle, WA.

出版信息

JCO Clin Cancer Inform. 2024 Dec;8:e2400123. doi: 10.1200/CCI.24.00123. Epub 2024 Dec 11.

Abstract

PURPOSE

Precision oncology in non-small cell lung cancer (NSCLC) relies on biomarker testing for clinical decision making. Despite its importance, challenges like the lack of genomic oncology training, nonstandardized biomarker reporting, and a rapidly evolving treatment landscape hinder its practice. Generative artificial intelligence (AI), such as ChatGPT, offers promise for enhancing clinical decision support. Effective performance metrics are crucial to evaluate these models' accuracy and their propensity for producing incorrect or hallucinated information. We assessed various ChatGPT versions' ability to generate accurate next-generation sequencing reports and treatment recommendations for NSCLC, using a novel Generative AI Performance Score (G-PS), which considers accuracy, relevancy, and hallucinations.

METHODS

We queried ChatGPT versions for first-line NSCLC treatment recommendations with an Food and Drug Administration-approved targeted therapy, using a zero-shot prompt approach for eight oncogenes. Responses were assessed against National Comprehensive Cancer Network (NCCN) guidelines for accuracy, relevance, and hallucinations, with G-PS calculating scores from -1 (all hallucinations) to 1 (fully NCCN-compliant recommendations). G-PS was designed as a composite measure with a base score for correct recommendations (weighted for preferred treatments) and a penalty for hallucinations.

RESULTS

Analyzing 160 responses, generative pre-trained transformer (GPT)-4 outperformed GPT-3.5, showing higher base score (90% 60%; < .01) and fewer hallucinations (34% 53%; < .01). GPT-4's overall G-PS was significantly higher (0.34 -0.15; < .01), indicating superior performance.

CONCLUSION

This study highlights the rapid improvement of generative AI in matching treatment recommendations with biomarkers in precision oncology. Although the rate of hallucinations improved in the GPT-4 model, future generative AI use in clinical care requires high levels of accuracy with minimal to no room for hallucinations. The GP-S represents a novel metric quantifying generative AI utility in health care compared with national guidelines, with potential adaptation beyond precision oncology.

摘要

目的

非小细胞肺癌(NSCLC)的精准肿瘤学依赖生物标志物检测来进行临床决策。尽管其很重要,但诸如缺乏基因组肿瘤学培训、生物标志物报告不规范以及治疗格局迅速演变等挑战阻碍了其实际应用。生成式人工智能(AI),如ChatGPT,有望增强临床决策支持。有效的性能指标对于评估这些模型的准确性以及它们产生错误或幻觉信息的倾向至关重要。我们使用一种新颖的生成式人工智能性能评分(G-PS)评估了各种ChatGPT版本生成准确的非小细胞肺癌下一代测序报告和治疗建议的能力,该评分考虑了准确性、相关性和幻觉。

方法

我们使用零样本提示方法针对8种致癌基因向ChatGPT版本询问一线非小细胞肺癌治疗建议及美国食品药品监督管理局批准的靶向治疗方法。根据美国国立综合癌症网络(NCCN)指南对回答的准确性、相关性和幻觉进行评估,G-PS计算从-1(全是幻觉)到1(完全符合NCCN指南的建议)的分数。G-PS被设计为一种综合度量,有正确建议的基础分数(根据首选治疗加权)和对幻觉的惩罚。

结果

分析160个回答,生成式预训练变换器(GPT)-4的表现优于GPT-3.5,基础分数更高(90%对60%;P <.01)且幻觉更少(34%对53%;P <.01)。GPT-4的总体G-PS显著更高(0.34对-0.15;P <.01),表明性能更优。

结论

本研究突出了生成式人工智能在精准肿瘤学中将治疗建议与生物标志物相匹配方面的快速进步。尽管GPT-4模型中的幻觉发生率有所改善,但未来在临床护理中使用生成式人工智能需要高度准确,几乎没有幻觉空间。与国家指南相比,G-PS代表了一种量化生成式人工智能在医疗保健中效用的新指标,有可能在精准肿瘤学之外进行调整应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/207f/11634130/520050014c1f/cci-8-e2400123-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验