Zhao Yijun, Wang Tong, Bove Riley, Cree Bruce, Henry Roland, Lokhande Hrishikesh, Polgar-Turcsanyi Mariann, Anderson Mark, Bakshi Rohit, Weiner Howard L, Chitnis Tanuja
Department of Computer and Information Science, Fordham University, New York, NY USA.
University of California, San Francisco, MA USA.
NPJ Digit Med. 2020 Oct 16;3:135. doi: 10.1038/s41746-020-00338-8. eCollection 2020.
The rate of disability accumulation varies across multiple sclerosis (MS) patients. Machine learning techniques may offer more powerful means to predict disease course in MS patients. In our study, 724 patients from the Comprehensive Longitudinal Investigation in MS at Brigham and Women's Hospital (CLIMB study) and 400 patients from the EPIC dataset, University of California, San Francisco, were included in the analysis. The primary outcome was an increase in () ≥ 1.5 (worsening) or not (non-worsening) at up to 5 years after the baseline visit. Classification models were built using the CLIMB dataset with patients' clinical and MRI longitudinal observations in first 2 years, and further validated using the EPIC dataset. We compared the performance of three popular machine learning algorithms ( and ) and three ensemble learning approaches (, and a Meta-learner ). A "threshold" was established to trade-off the performance between the two classes. Predictive features were identified and compared among different models. Machine learning models achieved 0.79 and 0.83 AUC scores for the CLIMB and EPIC datasets, respectively, shortly after disease onset. Ensemble learning methods were more effective and robust compared to standalone algorithms. Two ensemble models, XGBoost and LightGBM were superior to the other four models evaluated in our study. Of variables evaluated, EDSS, , and were the top common predictors in forecasting the MS disease course. Machine learning techniques, in particular ensemble methods offer increased accuracy for the prediction of MS disease course.
多发性硬化症(MS)患者的残疾累积率各不相同。机器学习技术可能为预测MS患者的疾病进程提供更强大的手段。在我们的研究中,分析纳入了来自布莱根妇女医院MS综合纵向调查(CLIMB研究)的724名患者以及来自加利福尼亚大学旧金山分校EPIC数据集的400名患者。主要结局是在基线访视后长达5年时扩展残疾状态量表(EDSS)增加≥1.5(病情恶化)或未增加(病情未恶化)。使用CLIMB数据集构建分类模型,该数据集包含患者前两年的临床和MRI纵向观察数据,并使用EPIC数据集进行进一步验证。我们比较了三种流行的机器学习算法([算法名称未给出])和三种集成学习方法([方法名称未给出]以及一种元学习器)的性能。建立了一个“阈值”来权衡两类之间的性能。在不同模型中识别并比较了预测特征。在疾病发作后不久,机器学习模型在CLIMB和EPIC数据集上分别获得了0.79和0.83的曲线下面积(AUC)分数。与独立算法相比,集成学习方法更有效且更稳健。两种集成模型,XGBoost和LightGBM优于我们研究中评估的其他四种模型。在评估的变量中,EDSS、[变量名称未给出]和[变量名称未给出]是预测MS疾病进程的最常见预测因素。机器学习技术,特别是集成方法,在预测MS疾病进程方面提高了准确性。