Smith Alexander, Elliott Paul, Mayr Manuel, Dehghan Abbas, Tzoulaki Ioanna
Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK.
Department of Epidemiology and Biostatistics, Sir Michael Uren Hub, 86 Wood Lane, London, W12 0BZ, UK.
Sci Rep. 2025 Jul 1;15(1):20520. doi: 10.1038/s41598-025-06232-1.
Plasma proteomics provides a unique opportunity to enhance disease prediction by capturing protein expression patterns linked to diverse pathological processes. Leveraging data from 2,923 proteins measured in 53,030 UK Biobank participants, we developed proteomic risk scores for 27 common outcomes over 5- and 15-year follow-up periods using two approaches: a linear ElasticNet regression model and a deep learning neural network (NN) model. Using Cox regression, we assessed the discrimination of proteomic risk scores either in isolation or as incremental improvements over clinical risk factors. We also studied the shared and unique protein predictors across conditions. Proteomic risk scores demonstrated strong discrimination for most outcomes, with a C-index > 0.80 for 12 diseases. NN models outperformed linear models for 11 outcomes, particularly for diseases such as Parkinson's disease (C-index 0.84) and pulmonary embolism (C-index 0.83), where nonlinear relationships contributed significantly to prediction. Across all outcomes, the addition of proteomic scores to clinical models improved predictive accuracy (ΔC-index 0.03), with the greatest gains observed in 9 diseases (ΔC-index > 0.1), including end-stage renal disease, pulmonary embolism, and Parkinson's disease. Analysis of protein contributions revealed shared predictors across multiple diseases, such as growth differentiation factor 15 (GDF15), as well as unique predictors like PAEP for endometriosis. While NN models may capture complex relationships, linear models provided value through simplicity and interpretability. These findings underscore the importance of tailoring predictive approaches to specific diseases and demonstrate the pivotal potential of proteomics in advancing risk stratification and early detection.
血浆蛋白质组学提供了一个独特的机会,通过捕捉与多种病理过程相关的蛋白质表达模式来增强疾病预测。利用在53030名英国生物银行参与者中测量的2923种蛋白质的数据,我们使用两种方法在5年和15年的随访期内为27种常见结局开发了蛋白质组学风险评分:线性弹性网络回归模型和深度学习神经网络(NN)模型。使用Cox回归,我们评估了蛋白质组学风险评分单独或作为对临床风险因素的增量改进的辨别力。我们还研究了不同疾病间共享和独特的蛋白质预测因子。蛋白质组学风险评分对大多数结局表现出很强的辨别力,12种疾病的C指数>0.80。对于11种结局,NN模型优于线性模型,特别是对于帕金森病(C指数0.84)和肺栓塞(C指数0.83)等疾病,其中非线性关系对预测有显著贡献。在所有结局中,将蛋白质组学评分添加到临床模型中可提高预测准确性(ΔC指数0.03),在9种疾病中观察到最大增益(ΔC指数>0.1),包括终末期肾病、肺栓塞和帕金森病。对蛋白质贡献的分析揭示了多种疾病间的共享预测因子,如生长分化因子15(GDF15),以及子宫内膜异位症的独特预测因子如PAEP。虽然NN模型可能捕捉复杂关系,但线性模型通过简单性和可解释性提供了价值。这些发现强调了针对特定疾病定制预测方法的重要性,并证明了蛋白质组学在推进风险分层和早期检测方面的关键潜力。