Suppr超能文献

一个扩展的临床 EEG 数据集,包含 15300 个自动标记的记录,用于病理学解码。

An extended clinical EEG dataset with 15,300 automatically labelled recordings for pathology decoding.

机构信息

Neuromedical AI Lab, Department of Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Engelbergerstr. 21, 79106 Freiburg, Germany; BrainLinks-BrainTools, IMBIT (Institute for Machine-Brain Interfacing Technology), University of Freiburg, Georges-Köhler-Allee 201, 79110 Freiburg, Germany; Autonomous Intelligent Systems, Computer Science Department - University of Freiburg, Faculty of Engineering, University of Freiburg, Georges-Köhler-Allee 80, 79110 Freiburg, Germany.

Neuromedical AI Lab, Department of Neurosurgery, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Engelbergerstr. 21, 79106 Freiburg, Germany; BrainLinks-BrainTools, IMBIT (Institute for Machine-Brain Interfacing Technology), University of Freiburg, Georges-Köhler-Allee 201, 79110 Freiburg, Germany.

出版信息

Neuroimage Clin. 2023;39:103482. doi: 10.1016/j.nicl.2023.103482. Epub 2023 Jul 28.

Abstract

Automated clinical EEG analysis using machine learning (ML) methods is a growing EEG research area. Previous studies on binary EEG pathology decoding have mainly used the Temple University Hospital (TUH) Abnormal EEG Corpus (TUAB) which contains approximately 3,000 manually labelled EEG recordings. To evaluate and eventually even improve the generalisation performance of machine learning methods for EEG pathology, decoding larger, publicly available datasets is required. A number of studies addressed the automatic labelling of large open-source datasets as an approach to create new datasets for EEG pathology decoding, but little is known about the extent to which training on larger, automatically labelled dataset affects decoding performances of established deep neural networks. In this study, we automatically created additional pathology labels for the Temple University Hospital (TUH) EEG Corpus (TUEG) based on the medical reports using a rule-based text classifier. We generated a dataset of 15,300 newly labelled recordings, which we call the TUH Abnormal Expansion EEG Corpus (TUABEX), and which is five times larger than the TUAB. Since the TUABEX contains more pathological (75%) than non-pathological (25%) recordings, we then selected a balanced subset of 8,879 recordings, the TUH Abnormal Expansion Balanced EEG Corpus (TUABEXB). To investigate how training on a larger, automatically labelled dataset affects the decoding performance of deep neural networks, we applied four established deep convolutional neural networks (ConvNets) to the task of pathological versus non-pathological classification and compared the performance of each architecture after training on different datasets. The results show that training on the automatically labelled TUABEXB dataset rather than training on the manually labelled TUAB dataset increases accuracies on TUABEXB and even for TUAB itself for some architectures. We argue that automatically labelling of large open-source datasets can be used to efficiently utilise the massive amount of EEG data stored in clinical archives. We make the proposed TUABEXB available open source and thus offer a new dataset for EEG machine learning research.

摘要

使用机器学习 (ML) 方法进行自动临床 EEG 分析是一个不断发展的 EEG 研究领域。之前关于二进制 EEG 病理解码的研究主要使用了包含大约 3000 个手动标记 EEG 记录的 Temple 大学医院 (TUH) 异常 EEG 语料库 (TUAB)。为了评估甚至最终提高 EEG 病理的机器学习方法的泛化性能,需要解码更大的、公开可用的数据集。一些研究解决了自动标记大型开源数据集的问题,将其作为创建 EEG 病理解码新数据集的一种方法,但对于在更大的自动标记数据集上训练对现有深度神经网络解码性能的影响程度知之甚少。在这项研究中,我们使用基于规则的文本分类器根据医疗报告自动为 Temple 大学医院 (TUH) EEG 语料库 (TUEG) 创建了额外的病理标签。我们生成了一个包含 15300 个新标记记录的数据集,我们称之为 Temple 大学医院异常扩展 EEG 语料库 (TUABEX),其大小是 TUAB 的五倍。由于 TUABEX 包含更多的病理记录 (75%) 而不是非病理记录 (25%),因此我们选择了 8879 个记录的平衡子集,即 Temple 大学医院异常扩展平衡 EEG 语料库 (TUABEXB)。为了研究在更大的自动标记数据集上训练如何影响深度神经网络的解码性能,我们将四个已建立的深度卷积神经网络 (ConvNets) 应用于病理与非病理分类任务,并比较了在不同数据集上训练后每个架构的性能。结果表明,在自动标记的 TUABEXB 数据集上进行训练而不是在手动标记的 TUAB 数据集上进行训练会提高某些架构在 TUABEXB 上甚至在 TUAB 本身的准确性。我们认为,自动标记大型开源数据集可以有效地利用存储在临床档案中的大量 EEG 数据。我们开源提供建议的 TUABEXB,并为 EEG 机器学习研究提供一个新的数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea8a/10432245/a81daeeae9f9/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验