Suppr超能文献

系统基准测试表明,大语言模型尚未达到传统罕见病决策支持工具的诊断准确性。

Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools.

作者信息

Reese Justin T, Chimirri Leonardo, Bridges Yasemin, Danis Daniel, Caufield J Harry, Wissink Kyran, McMurry Julie A, Graefe Adam Sl, Casiraghi Elena, Valentini Giorgio, Jacobsen Julius Ob, Haendel Melissa, Smedley Damian, Mungall Christopher J, Robinson Peter N

机构信息

Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.

Monarch Initiative.

出版信息

medRxiv. 2024 Nov 7:2024.07.22.24310816. doi: 10.1101/2024.07.22.24310816.

Abstract

Large language models (LLMs) show promise in supporting differential diagnosis, but their performance is challenging to evaluate due to the unstructured nature of their responses. To assess the current capabilities of LLMs to diagnose genetic diseases, we benchmarked these models on 5,213 case reports using the Phenopacket Schema, the Human Phenotype Ontology and Mondo disease ontology. Prompts generated from each phenopacket were sent to three generative pretrained transformer (GPT) models. The same phenopackets were used as input to a widely used diagnostic tool, Exomiser, in phenotype-only mode. The best LLM ranked the correct diagnosis first in 23.6% of cases, whereas Exomiser did so in 35.5% of cases. While the performance of LLMs for supporting differential diagnosis has been improving, it has not reached the level of commonly used traditional bioinformatics tools. Future research is needed to determine the best approach to incorporate LLMs into diagnostic pipelines.

摘要

大语言模型(LLMs)在支持鉴别诊断方面显示出前景,但由于其回复的非结构化性质,对其性能进行评估具有挑战性。为了评估大语言模型诊断遗传疾病的当前能力,我们使用表型数据包模式、人类表型本体和蒙多疾病本体,在5213份病例报告上对这些模型进行了基准测试。从每个表型数据包生成的提示被发送到三个生成式预训练变压器(GPT)模型。相同的表型数据包被用作仅表型模式下广泛使用的诊断工具Exomiser的输入。最佳的大语言模型在23.6%的病例中首先给出了正确诊断,而Exomiser在35.5%的病例中做到了这一点。虽然大语言模型在支持鉴别诊断方面的性能一直在提高,但尚未达到常用传统生物信息学工具的水平。需要未来的研究来确定将大语言模型纳入诊断流程的最佳方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c28d/11563241/53d6a99cf512/nihpp-2024.07.22.24310816v2-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验