Shenzhen Center for Disease Control and Prevention, Shenzhen, China.
Department of Computer Science and Technology, Tsinghua University, Beijing, China.
J Med Internet Res. 2024 Aug 23;26:e54616. doi: 10.2196/54616.
For medical diagnosis, clinicians typically begin with a patient's chief concerns, followed by questions about symptoms and medical history, physical examinations, and requests for necessary auxiliary examinations to gather comprehensive medical information. This complex medical investigation process has yet to be modeled by existing artificial intelligence (AI) methodologies.
The aim of this study was to develop an AI-driven medical inquiry assistant for clinical diagnosis that provides inquiry recommendations by simulating clinicians' medical investigating logic via reinforcement learning.
We compiled multicenter, deidentified outpatient electronic health records from 76 hospitals in Shenzhen, China, spanning the period from July to November 2021. These records consisted of both unstructured textual information and structured laboratory test results. We first performed feature extraction and standardization using natural language processing techniques and then used a reinforcement learning actor-critic framework to explore the rational and effective inquiry logic. To align the inquiry process with actual clinical practice, we segmented the inquiry into 4 stages: inquiring about symptoms and medical history, conducting physical examinations, requesting auxiliary examinations, and terminating the inquiry with a diagnosis. External validation was conducted to validate the inquiry logic of the AI model.
This study focused on 2 retrospective inquiry-and-diagnosis tasks in the emergency and pediatrics departments. The emergency departments provided records of 339,020 consultations including mainly children (median age 5.2, IQR 2.6-26.1 years) with various types of upper respiratory tract infections (250,638/339,020, 73.93%). The pediatrics department provided records of 561,659 consultations, mainly of children (median age 3.8, IQR 2.0-5.7 years) with various types of upper respiratory tract infections (498,408/561,659, 88.73%). When conducting its own inquiries in both scenarios, the AI model demonstrated high diagnostic performance, with areas under the receiver operating characteristic curve of 0.955 (95% CI 0.953-0.956) and 0.943 (95% CI 0.941-0.944), respectively. When the AI model was used in a simulated collaboration with physicians, it notably reduced the average number of physicians' inquiries to 46% (6.037/13.26; 95% CI 6.009-6.064) and 43% (6.245/14.364; 95% CI 6.225-6.269) while achieving areas under the receiver operating characteristic curve of 0.972 (95% CI 0.970-0.973) and 0.968 (95% CI 0.967-0.969) in the scenarios. External validation revealed a normalized Kendall τ distance of 0.323 (95% CI 0.301-0.346), indicating the inquiry consistency of the AI model with physicians.
This retrospective analysis of predominantly respiratory pediatric presentations in emergency and pediatrics departments demonstrated that an AI-driven diagnostic assistant had high diagnostic performance both in stand-alone use and in simulated collaboration with clinicians. Its investigation process was found to be consistent with the clinicians' medical investigation logic. These findings highlight the diagnostic assistant's promise in assisting the decision-making processes of health care professionals.
对于医学诊断,临床医生通常首先从患者的主要关注点开始,然后询问症状和病史、进行体格检查,并请求必要的辅助检查,以收集全面的医学信息。现有的人工智能 (AI) 方法尚未对这一复杂的医学调查过程进行建模。
本研究旨在开发一种 AI 驱动的临床诊断医学查询助手,通过强化学习模拟临床医生的医学调查逻辑,提供查询建议。
我们从中国深圳的 76 家医院编译了多中心、去标识化的门诊电子健康记录,涵盖 2021 年 7 月至 11 月期间的数据。这些记录包括非结构化的文本信息和结构化的实验室检查结果。我们首先使用自然语言处理技术进行特征提取和标准化,然后使用强化学习的 actor-critic 框架来探索合理有效的查询逻辑。为了使查询过程与实际临床实践保持一致,我们将查询分为 4 个阶段:询问症状和病史、进行体格检查、请求辅助检查和根据诊断结束查询。我们进行了外部验证以验证 AI 模型的查询逻辑。
本研究主要关注急诊和儿科部门的 2 项回顾性查询和诊断任务。急诊科提供了包括主要为儿童(中位数年龄 5.2 岁,IQR 2.6-26.1 岁)在内的 339020 次咨询的记录,这些咨询涉及各种类型的上呼吸道感染(250638/339020,73.93%)。儿科部门提供了 561659 次咨询记录,主要为儿童(中位数年龄 3.8 岁,IQR 2.0-5.7 岁),患有各种类型的上呼吸道感染(498408/561659,88.73%)。在这两种情况下,AI 模型在进行自身查询时均表现出较高的诊断性能,在接收器工作特征曲线下的面积分别为 0.955(95%置信区间 0.953-0.956)和 0.943(95%置信区间 0.941-0.944)。当 AI 模型与医生模拟合作时,它显著减少了医生的平均查询次数,分别为 6.037/13.26(95%置信区间 6.009-6.064)和 6.245/14.364(95%置信区间 6.225-6.269),同时在这两种情况下,接收器工作特征曲线下的面积分别为 0.972(95%置信区间 0.970-0.973)和 0.968(95%置信区间 0.967-0.969)。外部验证显示归一化 Kendall τ 距离为 0.323(95%置信区间 0.301-0.346),表明 AI 模型与医生的查询一致性。
本研究对急诊和儿科部门的主要呼吸儿科表现进行了回顾性分析,结果表明,AI 驱动的诊断助手在独立使用和与临床医生模拟合作时均具有较高的诊断性能。其调查过程与临床医生的医学调查逻辑一致。这些发现突显了诊断助手在辅助医疗保健专业人员决策过程中的潜力。