Barnett Alina Jade, Guo Zhicheng, Jing Jin, Ge Wendong, Kaplan Peter W, Kong Wan Yee, Karakis Ioannis, Herlopian Aline, Jayagopal Lakshman Arcot, Taraschenko Olga, Selioutski Olga, Osman Gamaleldin, Goldenholz Daniel, Rudin Cynthia, Westover M Brandon
Computer Science, Duke University, Durham, NC.
Pratt School of Engineering, Duke University, Durham, NC.
NEJM AI. 2024 Jun;1(6). doi: 10.1056/aioa2300331. Epub 2024 May 23.
In intensive care units (ICUs), critically ill patients are monitored with electroencephalography (EEG) to prevent serious brain injury. EEG monitoring is constrained by clinician availability, and EEG interpretation can be subjective and prone to interobserver variability. Automated deep-learning systems for EEG could reduce human bias and accelerate the diagnostic process. However, existing uninterpretable (black-box) deep-learning models are untrustworthy, difficult to troubleshoot, and lack accountability in real-world applications, leading to a lack of both trust and adoption by clinicians.
We developed an interpretable deep-learning system that accurately classifies six patterns of potentially harmful EEG activity - seizure, lateralized periodic discharges (LPDs), generalized periodic discharges (GPDs), lateralized rhythmic delta activity (LRDA), generalized rhythmic delta activity (GRDA), and other patterns - while providing faithful case-based explanations of its predictions. The model was trained on 50,697 total 50-second continuous EEG samples collected from 2711 patients in the ICU between July 2006 and March 2020 at Massachusetts General Hospital. EEG samples were labeled as one of the six EEG patterns by 124 domain experts and trained annotators. To evaluate the model, we asked eight medical professionals with relevant backgrounds to classify 100 EEG samples into the six pattern categories - once with and once without artificial intelligence (AI) assistance - and we assessed the assistive power of this interpretable system by comparing the diagnostic accuracy of the two methods. The model's discriminatory performance was evaluated with area under the receiver-operating characteristic curve (AUROC) and area under the precision-recall curve. The model's interpretability was measured with task-specific neighborhood agreement statistics that interrogated the similarities of samples and features. In a separate analysis, the latent space of the neural network was visualized by using dimension reduction techniques to examine whether the ictal-interictal injury continuum hypothesis, which asserts that seizures and seizure-like patterns of brain activity lie along a spectrum, is supported by data.
The performance of all users significantly improved when provided with AI assistance. Mean user diagnostic accuracy improved from 47 to 71% (P<0.04). The model achieved AUROCs of 0.87, 0.93, 0.96, 0.92, 0.93, and 0.80 for the classes seizure, LPD, GPD, LRDA, GRDA, and other patterns, respectively. This performance was significantly higher than that of a corresponding uninterpretable black-box model (with P<0.0001). Videos traversing the ictal-interictal injury manifold from dimension reduction (a two-dimensional representation of the original high-dimensional feature space) give insight into the layout of EEG patterns within the network's latent space and illuminate relationships between EEG patterns that were previously hypothesized but had not yet been shown explicitly. These results indicate that the ictal-interictal injury continuum hypothesis is supported by data.
Users showed significant pattern classification accuracy improvement with the assistance of this interpretable deep-learning model. The interpretable design facilitates effective human-AI collaboration; this system may improve diagnosis and patient care in clinical settings. The model may also provide a better understanding of how EEG patterns relate to each other along the ictal-interictal injury continuum. (Funded by the National Science Foundation, National Institutes of Health, and others.).
在重症监护病房(ICU)中,对危重症患者进行脑电图(EEG)监测以预防严重脑损伤。EEG监测受临床医生可用性的限制,并且EEG解读可能具有主观性且容易出现观察者间差异。用于EEG的自动化深度学习系统可以减少人为偏差并加快诊断过程。然而,现有的不可解释(黑箱)深度学习模型不可信,难以故障排除,并且在实际应用中缺乏问责制,导致临床医生缺乏信任且未采用。
我们开发了一种可解释的深度学习系统,该系统能够准确地对六种潜在有害的EEG活动模式进行分类——癫痫发作、侧化周期性放电(LPD)、全身性周期性放电(GPD)、侧化节律性δ活动(LRDA)、全身性节律性δ活动(GRDA)以及其他模式——同时对其预测提供基于病例的可靠解释。该模型在2006年7月至2020年3月期间于马萨诸塞州综合医院从2711名ICU患者收集的总共50,697个50秒连续EEG样本上进行训练。EEG样本由124名领域专家和经过培训的注释者标记为六种EEG模式之一。为了评估该模型,我们让八位具有相关背景的医学专业人员将一百个EEG样本分类为六种模式类别——一次在有人工智能(AI)辅助的情况下,一次在没有AI辅助的情况下——并且我们通过比较两种方法的诊断准确性来评估这个可解释系统 的辅助能力。该模型的判别性能通过受试者工作特征曲线下面积(AUROC)和精确召回率曲线下面积进行评估。该模型的可解释性通过询问样本和特征相似性的特定任务邻域一致性统计量来衡量。在一项单独的分析中,通过使用降维技术对神经网络的潜在空间进行可视化,以检查发作 - 发作间期损伤连续体假说(该假说断言癫痫发作和类似癫痫发作的脑活动模式位于一个连续谱上)是否得到数据支持。
在有AI辅助的情况下,所有用户的表现都有显著提高。用户的平均诊断准确性从47%提高到71%(P<0.04)。该模型对癫痫发作、LPD、GPD、LRDA、GRDA和其他模式类别的AUROC分别为0.87、0.93、0.96、0.92、0.93和0.80。这一性能显著高于相应的不可解释黑箱模型(P<0.00 < 0.0001)。从降维(原始高维特征空间的二维表示)中穿越发作 - 发作间期损伤流形的视频深入了解了网络潜在空间内EEG模式的布局,并阐明了先前假设但尚未明确显示的EEG模式之间的关系。这些结果表明发作 - 发作间期损伤连续体假说得到了数据支持。
在这个可解释的深度学习模型的辅助下,用户的模式分类准确性有显著提高。可解释的设计促进了有效的人机协作;该系统可能会改善临床环境中的诊断和患者护理。该模型还可能更好地理解EEG模式在发作 - 发作间期损伤连续体上如何相互关联。(由美国国家科学基金会、美国国立卫生研究院等资助。)