Analytical Neurophysiology Lab, Montreal Neurological Institute and Hospital, Montreal, Quebec, Canada.
Neurophysiology Unit, Institute of Neurosurgery Dr. Asenjo, Santiago, Chile.
Epilepsia. 2024 Oct;65(10):3028-3037. doi: 10.1111/epi.18082. Epub 2024 Aug 14.
The automated interpretation of clinical electroencephalograms (EEGs) using artificial intelligence (AI) holds the potential to bridge the treatment gap in resource-limited settings and reduce the workload at specialized centers. However, to facilitate broad clinical implementation, it is essential to establish generalizability across diverse patient populations and equipment. We assessed whether SCORE-AI demonstrates diagnostic accuracy comparable to that of experts when applied to a geographically different patient population, recorded with distinct EEG equipment and technical settings.
We assessed the diagnostic accuracy of a "fixed-and-frozen" AI model, using an independent dataset and external gold standard, and benchmarked it against three experts blinded to all other data. The dataset comprised 50% normal and 50% abnormal routine EEGs, equally distributed among the four major classes of EEG abnormalities (focal epileptiform, generalized epileptiform, focal nonepileptiform, and diffuse nonepileptiform). To assess diagnostic accuracy, we computed sensitivity, specificity, and accuracy of the AI model and the experts against the external gold standard.
We analyzed EEGs from 104 patients (64 females, median age = 38.6 [range = 16-91] years). SCORE-AI performed equally well compared to the experts, with an overall accuracy of 92% (95% confidence interval [CI] = 90%-94%) versus 94% (95% CI = 92%-96%). There was no significant difference between SCORE-AI and the experts for any metric or category. SCORE-AI performed well independently of the vigilance state (false classification during awake: 5/41 [12.2%], false classification during sleep: 2/11 [18.2%]; p = .63) and normal variants (false classification in presence of normal variants: 4/14 [28.6%], false classification in absence of normal variants: 3/38 [7.9%]; p = .07).
SCORE-AI achieved diagnostic performance equal to human experts in an EEG dataset independent of the development dataset, in a geographically distinct patient population, recorded with different equipment and technical settings than the development dataset.
使用人工智能(AI)自动解读临床脑电图(EEG),有可能在资源有限的环境中缩小治疗差距,并减轻专业中心的工作负担。然而,为了便于广泛的临床应用,必须在不同的患者群体和设备中建立可推广性。我们评估了 SCORE-AI 在应用于地理位置不同、记录设备和技术设置不同的患者群体时,其诊断准确性是否与专家相当。
我们使用独立数据集和外部金标准评估了一种“固定冻结”人工智能模型的诊断准确性,并将其与三位对所有其他数据均不知情的专家进行了基准测试。该数据集由 50%正常和 50%异常常规 EEG 组成,均匀分布在 EEG 异常的四个主要类别(局灶性癫痫样、全面性癫痫样、局灶性非癫痫样和弥漫性非癫痫样)中。为了评估诊断准确性,我们计算了 AI 模型和专家对外部金标准的敏感性、特异性和准确性。
我们分析了 104 名患者(64 名女性,中位年龄 38.6 [范围 16-91] 岁)的 EEG。SCORE-AI 的表现与专家相当,总准确率为 92%(95%置信区间 [CI] = 90%-94%),而专家的准确率为 94%(95% CI = 92%-96%)。在任何指标或类别中,SCORE-AI 与专家之间均无显著差异。SCORE-AI 独立于警觉状态(清醒时的错误分类:5/41 [12.2%],睡眠时的错误分类:2/11 [18.2%];p =.63)和正常变异(正常变异时的错误分类:4/14 [28.6%],无正常变异时的错误分类:3/38 [7.9%];p =.07)表现良好。
SCORE-AI 在与开发数据集不同的地理位置、不同记录设备和技术设置的患者群体中,独立于开发数据集,在 EEG 数据集中实现了与人类专家相当的诊断性能。