Suppr超能文献

大语言模型的概率医学预测

Probabilistic medical predictions of large language models.

作者信息

Gu Bowen, Desai Rishi J, Lin Kueiyu Joshua, Yang Jie

机构信息

Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.

Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.

出版信息

NPJ Digit Med. 2024 Dec 19;7(1):367. doi: 10.1038/s41746-024-01366-4.

Abstract

Large Language Models (LLMs) have shown promise in clinical applications through prompt engineering, allowing flexible clinical predictions. However, they struggle to produce reliable prediction probabilities, which are crucial for transparency and decision-making. While explicit prompts can lead LLMs to generate probability estimates, their numerical reasoning limitations raise concerns about reliability. We compared explicit probabilities from text generation to implicit probabilities derived from the likelihood of predicting the correct label token. Across six advanced open-source LLMs and five medical datasets, explicit probabilities consistently underperformed implicit probabilities in discrimination, precision, and recall. This discrepancy is more pronounced with smaller LLMs and imbalanced datasets, highlighting the need for cautious interpretation, improved probability estimation methods, and further research for clinical use of LLMs.

摘要

大语言模型(LLMs)已通过提示工程在临床应用中展现出前景,实现了灵活的临床预测。然而,它们在生成可靠的预测概率方面存在困难,而预测概率对于透明度和决策至关重要。虽然明确的提示可使大语言模型生成概率估计值,但其数值推理局限性引发了对可靠性的担忧。我们将文本生成的明确概率与从预测正确标签令牌的可能性得出的隐式概率进行了比较。在六个先进的开源大语言模型和五个医学数据集上,明确概率在区分度、精度和召回率方面始终不如隐式概率。这种差异在较小的大语言模型和不平衡数据集上更为明显,凸显了谨慎解释、改进概率估计方法以及对大语言模型临床应用进行进一步研究的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6038/11659327/dd423338c8c5/41746_2024_1366_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验