Department of Neurology, Xiangya Hospital, Central South University, Changsha, China.
National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, China.
Mov Disord. 2021 Jan;36(1):216-224. doi: 10.1002/mds.28311. Epub 2020 Sep 29.
In polyglutamine (polyQ) disease, the investigation of the prediction of a patient's age at onset (AAO) facilitates the development of disease-modifying intervention and underpins the delay of disease onset and progression. Few polyQ disease studies have evaluated AAO predicted by machine-learning algorithms and linear regression methods.
The objective of this study was to develop a machine-learning model for AAO prediction in the largest spinocerebellar ataxia type 3/Machado-Joseph disease (SCA3/MJD) population from mainland China.
In this observational study, we introduced an innovative approach by systematically comparing the performance of 7 machine-learning algorithms with linear regression to explore AAO prediction in SCA3/MJD using CAG expansions of 10 polyQ-related genes, sex, and parental origin.
Similar prediction performance of testing set and training set in each models were identified and few overfitting of training data was observed. Overall, the machine-learning-based XGBoost model exhibited the most favorable performance in AAO prediction over the traditional linear regression method and other 6 machine-learning algorithms for the training set and testing set. The optimal XGBoost model achieved mean absolute error, root mean square error, and median absolute error of 5.56, 7.13, 4.15 years, respectively, in testing set 1, with mean absolute error (4.78 years), root mean square error (6.31 years), and median absolute error (3.59 years) in testing set 2.
Machine-learning algorithms can be used to predict AAO in patients with SCA3/MJD. The optimal XGBoost algorithm can provide a good reference for the establishment and optimization of prediction models for SCA3/MJD or other polyQ diseases. © 2020 International Parkinson and Movement Disorder Society.
在多聚谷氨酰胺(polyQ)疾病中,预测患者的发病年龄(AAO)有助于开发疾病修饰干预措施,并可延迟疾病的发病和进展。很少有 polyQ 疾病研究评估过机器学习算法和线性回归方法预测的 AAO。
本研究旨在为中国大陆最大的脊髓小脑性共济失调 3 型/马查多-约瑟夫病(SCA3/MJD)人群开发一种用于 AAO 预测的机器学习模型。
在这项观察性研究中,我们采用了一种创新的方法,通过系统比较 7 种机器学习算法与线性回归的性能,使用 10 个与 polyQ 相关的基因、性别和父母来源的 CAG 扩展来探索 SCA3/MJD 的 AAO 预测。
在每个模型中,我们都发现了测试集和训练集之间相似的预测性能,并且几乎没有观察到训练数据的过拟合。总的来说,与传统的线性回归方法和其他 6 种机器学习算法相比,基于机器学习的 XGBoost 模型在训练集和测试集上对 AAO 预测的表现更为优异。最优的 XGBoost 模型在测试集 1 中达到了 5.56 年的平均绝对误差、7.13 年的均方根误差和 4.15 年的中位数绝对误差,在测试集 2 中达到了 4.78 年的平均绝对误差、6.31 年的均方根误差和 3.59 年的中位数绝对误差。
机器学习算法可用于预测 SCA3/MJD 患者的 AAO。最优的 XGBoost 算法可为 SCA3/MJD 或其他 polyQ 疾病的预测模型的建立和优化提供良好的参考。© 2020 国际帕金森病和运动障碍学会。