Achuri Maria Isabel Cano, Lara Montana Kay, Rabbo Khalil Abed, Wilson Benjamin T, Meek Austin, Mahoney J Matthew, Hernan Amanda E, Brockmeier Austin J
Department of Electrical and Computer Engineering, University of Delaware, Newark, Delaware, USA.
Department of Psychiatry, University of California San Diego, La Jolla, California, USA.
bioRxiv. 2025 Aug 20:2025.08.14.670397. doi: 10.1101/2025.08.14.670397.
Electroencephalograms (EEGs) are time-series records of the electrical potential from collective neural activity in the brain. EEG waveform patterns-rhythmic and irregular oscillations and transient patterns of sharp waves or spikes-are potential phenotypical biomarkers, reflecting genotype-specific neural activity. This is especially relevant to diagnosing epilepsy without direct seizure observations, which is common in clinical settings, as well as in animal models, which often have subtle neurological phenotypes without overt epilepsy. Herein, we investigate genotypic prediction from long-term EEG signals of freely behaving mice belonging to six groups defined by the presence or absence of a neurological disease-genotype ( gene knockout) in three different inbred strains with distinct genetic backgrounds. The potential complexity of genotype-related EEG patterns motivates a machine learning approach to automatically extract time-series descriptors, such as waveforms or spectral content, as biomarkers. We propose a machine learning approach to predict the genotypes of individual mice from the occurrence counts of waveforms that approximate short windows of the EEG. That is, a dictionary of waveforms is optimized to approximate windows from each genotype, and the vectors of waveform occurrence counts are the features for predicting genotypes via logistic regression models. Across two-fold cross-validation of the waveform dictionary learning, and leave-one-individual-out genotype prediction, we find that waveform counts pooled over multiple hour segments enable reliable prediction of mouse strain with an accuracy of 70% (chance rate of 38%), and for two of the three strains, DBA2 and C57B6, strain-specific classifiers reliably determined the epilepsy-genotype ( gene knockout) at a 67% sensitivity with a 100% specificity for DBA2 and 67% specificity for C57B6. None of the mice of these strains had evidence of overt seizures or EEG-based seizure detection. The methodologies and results show the potential of EEG waveforms as phenotypes and bag-of-waves as a feature representation for identifying epilepsy genotypes.
脑电图(EEG)是大脑中集体神经活动产生的电位的时间序列记录。EEG波形模式——有节奏和不规则的振荡以及尖波或棘波的瞬态模式——是潜在的表型生物标志物,反映了基因型特异性神经活动。这对于在没有直接观察到癫痫发作的情况下诊断癫痫尤为重要,这种情况在临床环境中很常见,在动物模型中也是如此,动物模型通常具有微妙的神经表型但没有明显的癫痫。在此,我们研究了来自自由活动小鼠的长期EEG信号的基因型预测,这些小鼠属于六组,由三种具有不同遗传背景的近交系中是否存在神经疾病基因型(基因敲除)来定义。与基因型相关的EEG模式的潜在复杂性促使采用机器学习方法来自动提取时间序列描述符,如波形或频谱内容,作为生物标志物。我们提出了一种机器学习方法,根据近似EEG短窗口的波形出现次数来预测个体小鼠的基因型。也就是说,优化波形字典以近似每个基因型的窗口,波形出现次数的向量是通过逻辑回归模型预测基因型的特征。在波形字典学习的两重交叉验证以及留一法个体基因型预测中,我们发现跨多个小时段汇总的波形计数能够可靠地预测小鼠品系,准确率为70%(随机率为38%),对于三个品系中的两个品系,即DBA2和C57B6,品系特异性分类器能够以67%的灵敏度可靠地确定癫痫基因型(基因敲除),DBA2的特异性为100%,C57B6的特异性为67%。这些品系的小鼠均没有明显癫痫发作或基于EEG的癫痫检测证据。这些方法和结果表明了EEG波形作为表型以及波形袋作为识别癫痫基因型的特征表示的潜力。