定制大语言模型对罕见儿科疾病病例报告的诊断准确性

Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports.

作者信息

Young Cameron C, Enichen Ellie, Rivera Christian, Auger Corinne A, Grant Nathan, Rao Arya, Succi Marc D

机构信息

Harvard Medical School, Boston, Massachusetts, USA.

Medically Engineered Solutions in Healthcare Incubator, Innovation in Operations Research Center, Mass General Brigham, Boston, Massachusetts, USA.

出版信息

Am J Med Genet A. 2025 Feb;197(2):e63878. doi: 10.1002/ajmg.a.63878. Epub 2024 Sep 13.

DOI:10.1002/ajmg.a.63878

PMID:39268988

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12123583/

Abstract

Accurately diagnosing rare pediatric diseases frequently represent a clinical challenge due to their complex and unusual clinical presentations. Here, we explore the capabilities of three large language models (LLMs), GPT-4, Gemini Pro, and a custom-built LLM (GPT-4 integrated with the Human Phenotype Ontology [GPT-4 HPO]), by evaluating their diagnostic performance on 61 rare pediatric disease case reports. The performance of the LLMs were assessed for accuracy in identifying specific diagnoses, listing the correct diagnosis among a differential list, and broad disease categories. In addition, GPT-4 HPO was tested on 100 general pediatrics case reports previously assessed on other LLMs to further validate its performance. The results indicated that GPT-4 was able to predict the correct diagnosis with a diagnostic accuracy of 13.1%, whereas both GPT-4 HPO and Gemini Pro had diagnostic accuracies of 8.2%. Further, GPT-4 HPO showed an improved performance compared with the other two LLMs in identifying the correct diagnosis among its differential list and the broad disease category. Although these findings underscore the potential of LLMs for diagnostic support, particularly when enhanced with domain-specific ontologies, they also stress the need for further improvement prior to integration into clinical practice.

摘要

由于罕见儿科疾病临床表现复杂且不寻常，准确诊断这些疾病常常是一项临床挑战。在此，我们通过评估三种大语言模型（LLMs），即GPT-4、Gemini Pro和一个定制的大语言模型（与人类表型本体[GPT-4 HPO]集成的GPT-4）对61例罕见儿科疾病病例报告的诊断性能，来探索它们的能力。评估了大语言模型在识别特定诊断、在鉴别诊断列表中列出正确诊断以及宽泛疾病类别方面的准确性表现。此外，在先前由其他大语言模型评估过的100例普通儿科病例报告上对GPT-4 HPO进行了测试，以进一步验证其性能。结果表明，GPT-4能够以13.1%的诊断准确率预测正确诊断，而GPT-4 HPO和Gemini Pro的诊断准确率均为8.2%。此外，在鉴别诊断列表和宽泛疾病类别中识别正确诊断方面，GPT-4 HPO与其他两个大语言模型相比表现有所改善。尽管这些发现强调了大语言模型在诊断支持方面的潜力，特别是在通过特定领域本体进行增强时，但它们也强调了在整合到临床实践之前需要进一步改进。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

定制大语言模型对罕见儿科疾病病例报告的诊断准确性

Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

定制大语言模型对罕见儿科疾病病例报告的诊断准确性

Diagnostic Accuracy of a Custom Large Language Model on Rare Pediatric Disease Case Reports.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献