Department of Biostatistics, School of Public Health, Xuzhou Medical University, 209 Tongshan Road, Xuzhou, 221004, Jiangsu Province, China.
School of Mathematical Sciences, Huaibei Normal University, Huaibei, 235000, Anhui Province, China.
J Neurol. 2024 Apr;271(4):2010-2018. doi: 10.1007/s00415-023-12156-5. Epub 2024 Jan 4.
Parkinson's disease (PD) patients with tremor-dominant (TD) and non-tremor-dominant (NTD) subtypes exhibit heterogeneity. Rapid identification of different motor subtypes may help to develop personalized treatment plans.
The data were acquired from the Parkinson's Disease Progression Marker Initiative (PPMI). Following the identification of predictors utilizing recursive feature elimination (RFE), seven classical machine learning (ML) models, including logistic regression, support vector machine, decision tree, random forest, extreme gradient boosting, etc., were trained to predict patients' motor subtypes, evaluating the performance of models through the area under the receiver operating characteristic curve (AUC) and validating by the follow-up data.
The feature subset engendered by RFE encompassed 20 features, comprising some clinical assessments and cerebrospinal fluid α-synuclein (CSF α-syn). ML models fitted in the RFE subset performed better in the test and validation sets. The best performing model was support vector machines with the polynomial kernel (P-SVM), achieving an AUC of 0.898. Five-fold repeated cross-validation showed the P-SVM model with CSF α-syn performed better than the model without CSF α-syn (P = 0.034). The Shapley additive explanation plot (SHAP) illustrated that how the levels of each feature affect the predicted probability as NTD subtypes.
An interactive web application was developed based on the P-SVM model constructed from feature subset by RFE. It can identify the current motor subtypes of PD patients, making it easier to understand the status of patients and develop personalized treatment plans.
帕金森病(PD)患者存在震颤主导型(TD)和非震颤主导型(NTD)亚群的异质性。快速识别不同的运动亚型可能有助于制定个性化的治疗计划。
数据来自帕金森病进展标志物倡议(PPMI)。利用递归特征消除(RFE)识别预测因子后,采用包括逻辑回归、支持向量机、决策树、随机森林、极端梯度提升等七种经典机器学习(ML)模型,对患者的运动亚型进行预测,通过接收者操作特征曲线下面积(AUC)评估模型性能,并通过随访数据进行验证。
RFE 生成的特征子集包含 20 个特征,包括一些临床评估和脑脊液 α-突触核蛋白(CSF α-syn)。在 RFE 子集中拟合的 ML 模型在测试和验证集中表现更好。表现最好的模型是具有多项式核的支持向量机(P-SVM),AUC 为 0.898。五折重复交叉验证显示,包含 CSF α-syn 的 P-SVM 模型比不包含 CSF α-syn 的模型表现更好(P=0.034)。Shapley 加性解释图(SHAP)说明了每个特征的水平如何影响作为 NTD 亚型的预测概率。
基于 RFE 从特征子集中构建的 P-SVM 模型,开发了一个交互式网络应用程序。它可以识别 PD 患者当前的运动亚型,更方便地了解患者的状况并制定个性化的治疗计划。