Lee Jung-Hyun, Choi Eunhee, Angulo Sergio L, McDougal Robert A, Lytton William W
Department of Neurology, State University of New York Downstate Health Sciences University, Brooklyn, NY, United States.
Department of Neurology, Kings County Hospital, Brooklyn, NY, United States.
Front Med (Lausanne). 2025 Jan 17;11:1496866. doi: 10.3389/fmed.2024.1496866. eCollection 2024.
We propose the use of GPT-4 to facilitate initial history-taking in neurology and other medical specialties. A large language model (LLM) could be utilized as a digital twin which could enhance queryable electronic medical record (EMR) systems and provide healthcare conversational agents (HCAs) to replace waiting-room questionnaires.
In this observational pilot study, we presented verbatim history of present illness (HPI) narratives from published case reports of headache, stroke, and neurodegenerative diseases. Three standard GPT-4 models were designated Models : patient digital twin; : neurologist to query Model P; and : supervisor to synthesize the N-P dialogue into a derived HPI and formulate the differential diagnosis. Given the random variability of GPT-4 output, each case was presented five separate times to check consistency and reliability.
The study achieved an overall HPI content retrieval accuracy of 81%, with accuracies of 84% for headache, 82% for stroke, and 77% for neurodegenerative diseases. Retrieval accuracies for individual HPI components were as follows: 93% for chief complaints, 47% for associated symptoms and review of systems, 76% for relevant symptom details, and 94% for histories of past medical, surgical, allergies, social, and family factors. The ranking of case diagnoses in the differential diagnosis list averaged in the 89th percentile.
Our tripartite LLM model demonstrated accuracy in extracting essential information from published case reports. Further validation with EMR HPIs, and then with direct patient care will be needed to move toward adaptation of enhanced diagnostic digital twins that incorporate real-time data from health-monitoring devices and self-monitoring assessments.
我们建议使用GPT-4来促进神经病学和其他医学专业的初步病史采集。大语言模型(LLM)可作为数字孪生体使用,它可以增强可查询的电子病历(EMR)系统,并提供医疗对话代理(HCA)来取代候诊室问卷。
在这项观察性试点研究中,我们逐字呈现了来自已发表的头痛、中风和神经退行性疾病病例报告中的现病史(HPI)叙述。指定了三个标准的GPT-4模型:模型P:患者数字孪生体;模型N:向模型P提问的神经科医生;模型S:将N-P对话综合成派生HPI并制定鉴别诊断的监督者。鉴于GPT-4输出的随机变异性,每个病例分别呈现五次以检查一致性和可靠性。
该研究的总体HPI内容检索准确率为81%,其中头痛的准确率为84%,中风的准确率为82%,神经退行性疾病的准确率为77%。各个HPI组成部分的检索准确率如下:主要症状为93%,相关症状和系统回顾为47%,相关症状细节为76%,既往医疗、手术、过敏、社会和家庭因素史为94%。鉴别诊断列表中的病例诊断排名平均在第89百分位。
我们的三方LLM模型在从已发表的病例报告中提取基本信息方面表现出准确性。需要通过EMR的HPI进一步验证,然后通过直接的患者护理来推进增强型诊断数字孪生体的应用,该数字孪生体整合了来自健康监测设备和自我监测评估的实时数据。