Holberg EEG, Bergen, Norway.
Department of Clinical Neurophysiology, Haukeland University Hospital, Bergen, Norway.
JAMA Neurol. 2023 Aug 1;80(8):805-812. doi: 10.1001/jamaneurol.2023.1645.
Electroencephalograms (EEGs) are a fundamental evaluation in neurology but require special expertise unavailable in many regions of the world. Artificial intelligence (AI) has a potential for addressing these unmet needs. Previous AI models address only limited aspects of EEG interpretation such as distinguishing abnormal from normal or identifying epileptiform activity. A comprehensive, fully automated interpretation of routine EEG based on AI suitable for clinical practice is needed.
To develop and validate an AI model (Standardized Computer-based Organized Reporting of EEG-Artificial Intelligence [SCORE-AI]) with the ability to distinguish abnormal from normal EEG recordings and to classify abnormal EEG recordings into categories relevant for clinical decision-making: epileptiform-focal, epileptiform-generalized, nonepileptiform-focal, and nonepileptiform-diffuse.
DESIGN, SETTING, AND PARTICIPANTS: In this multicenter diagnostic accuracy study, a convolutional neural network model, SCORE-AI, was developed and validated using EEGs recorded between 2014 and 2020. Data were analyzed from January 17, 2022, until November 14, 2022. A total of 30 493 recordings of patients referred for EEG were included into the development data set annotated by 17 experts. Patients aged more than 3 months and not critically ill were eligible. The SCORE-AI was validated using 3 independent test data sets: a multicenter data set of 100 representative EEGs evaluated by 11 experts, a single-center data set of 9785 EEGs evaluated by 14 experts, and for benchmarking with previously published AI models, a data set of 60 EEGs with external reference standard. No patients who met eligibility criteria were excluded.
Diagnostic accuracy, sensitivity, and specificity compared with the experts and the external reference standard of patients' habitual clinical episodes obtained during video-EEG recording.
The characteristics of the EEG data sets include development data set (N = 30 493; 14 980 men; median age, 25.3 years [95% CI, 1.3-76.2 years]), multicenter test data set (N = 100; 61 men, median age, 25.8 years [95% CI, 4.1-85.5 years]), single-center test data set (N = 9785; 5168 men; median age, 35.4 years [95% CI, 0.6-87.4 years]), and test data set with external reference standard (N = 60; 27 men; median age, 36 years [95% CI, 3-75 years]). The SCORE-AI achieved high accuracy, with an area under the receiver operating characteristic curve between 0.89 and 0.96 for the different categories of EEG abnormalities, and performance similar to human experts. Benchmarking against 3 previously published AI models was limited to comparing detection of epileptiform abnormalities. The accuracy of SCORE-AI (88.3%; 95% CI, 79.2%-94.9%) was significantly higher than the 3 previously published models (P < .001) and similar to human experts.
In this study, SCORE-AI achieved human expert level performance in fully automated interpretation of routine EEGs. Application of SCORE-AI may improve diagnosis and patient care in underserved areas and improve efficiency and consistency in specialized epilepsy centers.
脑电图(EEG)是神经病学的基本评估手段,但在世界上许多地区都需要特殊的专业知识。人工智能(AI)具有满足这些未满足需求的潜力。以前的 AI 模型仅解决 EEG 解释的有限方面,例如区分正常与异常或识别癫痫样活动。需要一种基于 AI 的适用于临床实践的常规 EEG 的全面、全自动解释,以满足这些需求。
开发和验证一种人工智能模型(标准化基于计算机的 EEG 人工智能报告 [SCORE-AI]),该模型具有区分正常和异常脑电图记录的能力,并能够将异常脑电图记录分类为与临床决策相关的类别:癫痫样-局灶性、癫痫样-全面性、非癫痫样-局灶性和非癫痫样-弥漫性。
设计、地点和参与者:在这项多中心诊断准确性研究中,开发并验证了一种卷积神经网络模型 SCORE-AI,该模型使用 2014 年至 2020 年记录的 EEG 进行开发和验证。数据分析于 2022 年 1 月 17 日至 2022 年 11 月 14 日进行。共有 30493 名因脑电图检查而转诊的患者记录被纳入开发数据集,由 17 位专家进行注释。年龄大于 3 个月且非危重症的患者符合入选条件。SCORE-AI 使用 3 个独立的测试数据集进行验证:一个由 11 位专家评估的包含 100 个代表性 EEG 的多中心数据集、一个由 14 位专家评估的包含 9785 个 EEG 的单中心数据集以及用于与以前发表的 AI 模型进行基准比较的 60 个具有外部参考标准的 EEG 数据集。符合入选标准的患者未被排除。
与专家和在视频脑电图记录期间获得的患者习惯性发作的外部参考标准相比,评估患者的诊断准确性、敏感性和特异性。
脑电图数据集的特点包括:开发数据集(N=30493;14980 名男性;中位年龄 25.3 岁 [95%置信区间 1.3-76.2 岁])、多中心测试数据集(N=100;61 名男性,中位年龄 25.8 岁 [95%置信区间 4.1-85.5 岁])、单中心测试数据集(N=9785;5168 名男性;中位年龄 35.4 岁 [95%置信区间 0.6-87.4 岁])和具有外部参考标准的测试数据集(N=60;27 名男性;中位年龄 36 岁 [95%置信区间 3-75 岁])。SCORE-AI 达到了较高的准确性,不同类别的 EEG 异常的受试者工作特征曲线下面积在 0.89 至 0.96 之间,性能与人类专家相似。与 3 个以前发表的 AI 模型进行基准比较仅限于比较癫痫样异常的检测。SCORE-AI 的准确性(88.3%;95%置信区间 79.2%-94.9%)显著高于 3 个以前发表的模型(P<.001),与人类专家相当。
在这项研究中,SCORE-AI 在常规 EEG 的全自动解释方面达到了人类专家的水平。SCORE-AI 的应用可能会改善服务不足地区的诊断和患者护理,并提高专业癫痫中心的效率和一致性。