Massachusetts General Hospital, Department of Neurology, Boston, MA, United States; Harvard Medical School, Boston, MA, United States.
Georgia Institute of Technology, College of Computing, Atlanta, GA, Georgia.
J Neurosci Methods. 2021 Mar 1;351:108966. doi: 10.1016/j.jneumeth.2020.108966. Epub 2020 Oct 22.
Seizures and seizure-like electroencephalography (EEG) patterns, collectively referred to as "ictal interictal injury continuum" (IIIC) patterns, are commonly encountered in critically ill patients. Automated detection is important for patient care and to enable research. However, training accurate detectors requires a large labeled dataset. Active Learning (AL) may help select informative examples to label, but the optimal AL approach remains unclear.
We assembled >200,000 h of EEG from 1,454 hospitalized patients. From these, we collected 9,808 labeled and 120,000 unlabeled 10-second EEG segments. Labels included 6 IIIC patterns. In each AL iteration, a Dense-Net Convolutional Neural Network (CNN) learned vector representations for EEG segments using available labels, which were used to create a 2D embedding map. Nearest-neighbor label spreading within the embedding map was used to create additional pseudo-labeled data. A second Dense-Net was trained using real- and pseudo-labels. We evaluated several strategies for selecting candidate points for experts to label next. Finally, we compared two methods for class balancing within queries: standard balanced-based querying (SBBQ), and high confidence spread-based balanced querying (HCSBBQ).
Our results show: 1) Label spreading increased convergence speed for AL. 2) All query criteria produced similar results to random sampling. 3) HCSBBQ query balancing performed best. Using label spreading and HCSBBQ query balancing, we were able to train models approaching expert-level performance across all pattern categories after obtaining ∼7000 expert labels.
Our results provide guidance regarding the use of AL to efficiently label large EEG datasets in critically ill patients.
发作和类似发作的脑电图(EEG)模式,统称为“发作间损伤连续体(IIIC)”模式,在危重病患者中很常见。自动检测对于患者护理和研究都是很重要的。然而,训练准确的检测器需要一个大型的标记数据集。主动学习(AL)可以帮助选择有信息的示例进行标记,但最佳的 AL 方法仍不清楚。
我们收集了 1454 名住院患者的>200,000 小时的 EEG。从这些 EEG 中,我们收集了 9808 个有标签和 120,000 个无标签的 10 秒 EEG 片段。标签包括 6 种 IIIC 模式。在每个 AL 迭代中,一个密集网络卷积神经网络(CNN)使用可用的标签学习 EEG 片段的向量表示,这些表示被用于创建一个 2D 嵌入图。在嵌入图中使用最近邻标签传播来创建额外的伪标签数据。使用真实和伪标签训练第二个密集网络。我们评估了几种选择专家下一个标记候选点的策略。最后,我们比较了两种在查询中进行类别平衡的方法:标准平衡查询(SBBQ)和高置信度传播平衡查询(HCSBBQ)。
我们的结果表明:1)标签传播增加了 AL 的收敛速度。2)所有查询标准都产生了与随机抽样相似的结果。3)HCSBBQ 查询平衡表现最佳。使用标签传播和 HCSBBQ 查询平衡,在获得约 7000 个专家标签后,我们能够训练出接近专家水平性能的模型,适用于所有模式类别。
我们的结果为在危重病患者中有效地对大型 EEG 数据集进行标记提供了指导。