Mohammad Umair, Saeed Fahad
Electrical Computer and Biomedical Engineering Department, Union College, 807 Union St, Schenectady, NY 12308, US.
Knight Foundation School of Computing and Information Sciences, Florida International University, 11200 SW 8th St, Miami, FL 33199, USA.
MethodsX. 2025 Aug 22;15:103574. doi: 10.1016/j.mex.2025.103574. eCollection 2025 Dec.
Predicting epileptic seizures is a significantly more challenging task compared to seizure detection. However, most publicly available electroencephalography (EEG) datasets are geared towards detection because the ictal phase (main symptomatic period) is annotated. In contrast, prediction requires the availability of annotated preictal and interictal phases. To this end, we designed and developed a method called that can be used for converting any EEG big data annotated for detection into ML-ready data suitable for prediction. We apply our methods to the existing EEG data corpus to generate 12 ML-ready benchmarks comprising data for training, validating, and testing seizure prediction models. Our strategy uses different variations of seizure prediction horizon (SPH) and the seizure occurrence period (SOP) to produce >150GB of ML-ready data. To illustrate the usefulness of the generated data, we technically validate all the benchmarks using multiple machine learning (ML) and deep learning (DL) models. We hope that the generated benchmarking data will be utilized by various computational groups for their seizure prediction model development. The work can be summarized as follows:1.Extract short preictal and interictal segments from long-duration annotated EEG montages.2.Generate a comprehensive list of ML-ready benchmarks with varying SPH and SOP.3.Technically validate the generated data with multiple ML and DL models with up-to 88.73 % validation accuracy4.Opensource code and related materials are available at https://github.com/pcdslab/MLSPred-Bench.
与癫痫发作检测相比,预测癫痫发作是一项更具挑战性的任务。然而,大多数公开可用的脑电图(EEG)数据集都针对检测,因为发作期(主要症状期)已被标注。相比之下,预测需要有标注的发作前期和发作间期阶段的数据。为此,我们设计并开发了一种名为 的方法,可用于将任何为检测而标注的EEG大数据转换为适用于预测的机器学习就绪数据。我们将我们的方法应用于现有的EEG数据语料库,以生成12个机器学习就绪基准,包括用于训练、验证和测试癫痫发作预测模型的数据。我们的策略使用癫痫发作预测时域(SPH)和癫痫发作发生期(SOP)的不同变体来生成超过150GB的机器学习就绪数据。为了说明生成数据的有用性,我们使用多个机器学习(ML)和深度学习(DL)模型对所有基准进行了技术验证。我们希望生成的基准数据能被各个计算团队用于他们的癫痫发作预测模型开发。这项工作可总结如下:1. 从长时间标注的EEG蒙太奇中提取短的发作前期和发作间期片段。2. 生成具有不同SPH和SOP的全面的机器学习就绪基准列表。3. 使用多个ML和DL模型对生成的数据进行技术验证,验证准确率高达88.73%。4. 开源代码和相关材料可在https://github.com/pcdslab/MLSPred-Bench获取。