Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, United States.
Department of Pediatrics (Division of Neurology), Children's Hospital of Philadelphia, United States; Departments of Neurology and Pediatrics, University of Pennsylvania Perelman School of Medicine, United States.
Seizure. 2021 Apr;87:61-68. doi: 10.1016/j.seizure.2021.03.001. Epub 2021 Mar 4.
To determine whether machine learning techniques would enhance our ability to incorporate key variables into a parsimonious model with optimized prediction performance for electroencephalographic seizure (ES) prediction in critically ill children.
We analyzed data from a prospective observational cohort study of 719 consecutive critically ill children with encephalopathy who underwent clinically-indicated continuous EEG monitoring (CEEG). We implemented and compared three state-of-the-art machine learning methods for ES prediction: (1) random forest; (2) Least Absolute Shrinkage and Selection Operator (LASSO); and (3) Deep Learning Important FeaTures (DeepLIFT). We developed a ranking algorithm based on the relative importance of each variable derived from the machine learning methods.
Based on our ranking algorithm, the top five variables for ES prediction were: (1) epileptiform discharges in the initial 30 minutes, (2) clinical seizures prior to CEEG initiation, (3) sex, (4) age dichotomized at 1 year, and (5) epileptic encephalopathy. Compared to the stepwise selection-based approach in logistic regression, the top variables selected by our ranking algorithm were more informative as models utilizing the top variables achieved better prediction performance evaluated by prediction accuracy, AUROC and F1 score. Adding additional variables did not improve and sometimes worsened model performance.
The ranking algorithm was helpful in deriving a parsimonious model for ES prediction with optimal performance. However, application of state-of-the-art machine learning models did not substantially improve model performance compared to prior logistic regression models. Thus, to further improve the ES prediction, we may need to collect more samples and variables that provide additional information.
确定机器学习技术是否能够增强我们将关键变量纳入简化模型的能力,从而优化对危重病儿脑电图癫痫发作(ES)的预测。
我们分析了一项前瞻性观察性队列研究的数据,该研究纳入了 719 例连续接受有临床指征的连续脑电图监测(CEEG)的脑病危重病儿。我们实施并比较了三种用于 ES 预测的最先进的机器学习方法:(1)随机森林;(2)最小绝对值收缩和选择算子(LASSO);和(3)深度学习重要特征(DeepLIFT)。我们开发了一种基于机器学习方法中每个变量相对重要性的排序算法。
基于我们的排序算法,ES 预测的前五个变量是:(1)初始 30 分钟内的癫痫样放电;(2)CEEG 开始前的临床癫痫发作;(3)性别;(4)年龄以 1 岁为界;和(5)癫痫性脑病。与基于逐步选择的逻辑回归方法相比,我们的排序算法选择的前几个变量更具信息量,因为利用前几个变量的模型具有更好的预测性能,通过预测准确性、AUROC 和 F1 评分进行评估。添加额外的变量并没有提高,有时甚至会降低模型性能。
排序算法有助于为 ES 预测得出具有最佳性能的简化模型。然而,与之前的逻辑回归模型相比,应用最先进的机器学习模型并没有显著提高模型性能。因此,为了进一步提高 ES 预测,我们可能需要收集更多提供额外信息的样本和变量。