From the Institute of Science and Technology for Brain-Inspired Intelligence (J. You, L.W., Y.W., J.K., W.C., J.F.), and Department of Neurology (J. Yu), Huashan Hospital, Fudan University; Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University) (W.C., J.F.), Ministry of Education, Shanghai; Fudan ISTBI-ZJNU Algorithm Centre for Brain-inspired Intelligence (W.C., J.F.), Zhejiang Normal University; Shanghai Medical College and Zhongshan Hospital Immunotherapy Technology Transfer Center (W.C.); Zhangjiang Fudan International Innovation Center (J.F.); and School of Data Science (J.F.), Fudan University, Shanghai, China.
Neurology. 2024 Aug 13;103(3):e209531. doi: 10.1212/WNL.0000000000209531. Epub 2024 Jul 8.
Identification of individuals at high risk of developing Parkinson disease (PD) several years before diagnosis is crucial for developing treatments to prevent or delay neurodegeneration. This study aimed to develop predictive models for PD risk that combine plasma proteins and easily accessible clinical-demographic variables.
Using data from the UK Biobank (UKB), which recruited participants across the United Kingdom, we conducted a longitudinal study to identify predictors for incident PD. Participants with baseline plasma proteins and no PD were included. Through machine learning, we narrowed down predictors from a pool of 1,463 plasma proteins and 93 clinical-demographic. These predictors were then externally validated using the Parkinson's Progression Marker Initiative (PPMI) cohort. To further investigate the temporal trends of predictors, a nested case-control study was conducted within the UKB.
A total of 52,503 participants without PD (median age 58, 54% female) were included. Over a median follow-up duration of 14.0 years, 751 individuals were diagnosed with PD (median age 65, 37% female). Using a forward selection approach, we selected a panel of 22 plasma proteins for optimal prediction. Using an ensemble tree-based Light Gradient Boosting Machine (LightGBM) algorithm, the model achieved an area under the receiver operating characteristic curve (AUC) of 0.800 (95% CI 0.785-0.815). The LightGBM prediction model integrating both plasma proteins and clinical-demographic variables demonstrated enhanced predictive accuracy, with an AUC of 0.832 (95% CI 0.815-0.849). Key predictors identified included age, years of education, history of traumatic brain injury, and serum creatinine. The incorporation of 11 plasma proteins (neurofilament light, integrin subunit alpha V, hematopoietic PGD synthase, histamine N-methyltransferase, tubulin polymerization promoting protein family member 3, ectodysplasin A2 receptor, Latexin, interleukin-13 receptor subunit alpha-1, BAG family molecular chaperone regulator 3, tryptophanyl-TRNA synthetase, and secretogranin-2) augmented the model's predictive accuracy. External validation in the PPMI cohort confirmed the model's reliability, producing an AUC of 0.810 (95% CI 0.740-0.873). Notably, alterations in these predictors were detectable several years before the diagnosis of PD.
Our findings support the potential utility of a machine learning-based model integrating clinical-demographic variables with plasma proteins to identify individuals at high risk for PD within the general population. Although these predictors have been validated by PPMI, additional validation in a more diverse population reflective of the general community is essential.
在诊断前数年识别出患有帕金森病(PD)风险较高的个体对于开发预防或延缓神经退行性变的治疗方法至关重要。本研究旨在开发一种结合血浆蛋白和易于获得的临床-人口统计学变量的 PD 风险预测模型。
我们使用来自英国生物库(UKB)的数据进行了一项纵向研究,该研究在英国各地招募了参与者,以确定 PD 发病的预测因子。纳入基线时具有血浆蛋白且无 PD 的参与者。通过机器学习,我们从 1463 种血浆蛋白和 93 种临床-人口统计学变量中筛选出预测因子。然后,我们使用帕金森氏病进展标志物倡议(PPMI)队列对这些预测因子进行了外部验证。为了进一步研究预测因子的时间趋势,我们在 UKB 中进行了嵌套病例对照研究。
共纳入 52503 名无 PD 的参与者(中位年龄 58 岁,54%为女性)。在中位随访时间 14.0 年期间,有 751 人被诊断为 PD(中位年龄 65 岁,37%为女性)。使用正向选择方法,我们选择了一组 22 种血浆蛋白以进行最佳预测。使用基于集成树的 Light Gradient Boosting Machine(LightGBM)算法,模型的接收者操作特征曲线下面积(AUC)为 0.800(95%CI 0.785-0.815)。整合血浆蛋白和临床-人口统计学变量的 LightGBM 预测模型显示出更高的预测准确性,AUC 为 0.832(95%CI 0.815-0.849)。确定的关键预测因子包括年龄、受教育年限、创伤性脑损伤史和血清肌酐。纳入 11 种血浆蛋白(神经丝轻链、整合素亚单位αV、造血 PGD 合酶、组氨酸 N-甲基转移酶、微管聚合促进蛋白家族成员 3、外胚层发育不良 A2 受体、Latxin、白细胞介素-13 受体亚基α-1、BAG 家族分子伴侣调节剂 3、色氨酰-tRNA 合成酶和分泌颗粒蛋白-2)提高了模型的预测准确性。在 PPMI 队列中的外部验证证实了该模型的可靠性,AUC 为 0.810(95%CI 0.740-0.873)。值得注意的是,这些预测因子的变化在 PD 诊断前几年即可检测到。
我们的研究结果支持使用基于机器学习的模型整合临床-人口统计学变量与血浆蛋白来识别普通人群中 PD 风险较高的个体的潜力。尽管这些预测因子已经通过 PPMI 验证,但在更能反映普通社区的多样性人群中进行额外验证是必不可少的。