Diabetes and Cardiovascular Disease-Genetic Epidemiology, Department of Clinical Sciences in Malmö, Lund University, Clinical Research Centre House 60 Floor 13, Jan Waldenströms gata 35, 205 02, Malmö, Sweden.
Department of Population Medicine, College of Medicine Qatar University, Doha, Qatar.
Sci Rep. 2021 Mar 24;11(1):6734. doi: 10.1038/s41598-021-85991-z.
Novel methods to characterize the plasma proteome has made it possible to examine a wide range of proteins in large longitudinal cohort studies, but the complexity of the human proteome makes it difficult to identify robust protein-disease associations. Nevertheless, identification of individuals at high risk of early mortality is a central issue in clinical decision making and novel biomarkers may be useful to improve risk stratification. With adjustment for established risk factors, we examined the associations between 138 plasma proteins measured using two proximity extension assays and long-term risk of all-cause mortality in 3,918 participants of the population-based Malmö Diet and Cancer Study. To examine the reproducibility of protein-mortality associations we used a two-step random-split approach to simulate a discovery and replication cohort and conducted analyses using four different methods: Cox regression, stepwise Cox regression, Lasso-Cox regression, and random survival forest (RSF). In the total study population, we identified eight proteins that associated with all-cause mortality after adjustment for established risk factors and with Bonferroni correction for multiple testing. In the two-step analyses, the number of proteins selected for model inclusion in both random samples ranged from 6 to 21 depending on the method used. However, only three proteins were consistently included in both samples across all four methods (growth/differentiation factor-15 (GDF-15), N-terminal pro-B-type natriuretic peptide, and epididymal secretory protein E4). Using the total study population, the C-statistic for a model including established risk factors was 0.7222 and increased to 0.7284 with inclusion of the most predictive protein (GDF-15; P < 0.0001). All multiple protein models showed additional improvement in the C-statistic compared to the single protein model (all P < 0.0001). We identified several plasma proteins associated with increased risk of all-cause mortality independently of established risk factors. Further investigation into the putatively causal role of these proteins for longevity is needed. In addition, the examined methods for identifying multiple proteins showed tendencies for overfitting by including several putatively false positive findings. Thus, the reproducibility of findings using such approaches may be limited.
新型方法能够对血浆蛋白质组进行特征分析,从而在大型纵向队列研究中检测到广泛的蛋白质,但人类蛋白质组的复杂性使得难以识别稳健的蛋白质-疾病关联。尽管如此,识别高早期死亡率风险的个体是临床决策制定的核心问题,新型生物标志物可能有助于改善风险分层。在调整既定风险因素后,我们使用两种邻近延伸测定法测量的 138 种血浆蛋白与基于人群的马尔默饮食与癌症研究中 3918 名参与者的全因死亡率的长期风险之间的关系进行了研究。为了检验蛋白质与死亡率关联的重现性,我们使用两步随机分割方法模拟发现和复制队列,并使用四种不同的方法进行分析:Cox 回归、逐步 Cox 回归、Lasso-Cox 回归和随机生存森林(RSF)。在整个研究人群中,我们在调整既定风险因素并进行多重检验的 Bonferroni 校正后,确定了 8 种与全因死亡率相关的蛋白质。在两步分析中,根据所使用的方法,在两个随机样本中选择纳入模型的蛋白质数量范围从 6 到 21 不等。然而,只有三种蛋白质在所有四种方法中都始终包含在两个样本中(生长/分化因子-15(GDF-15)、N 端前 B 型利钠肽和附睾分泌蛋白 E4)。使用整个研究人群,包括既定风险因素的模型的 C 统计量为 0.7222,纳入最具预测性的蛋白质(GDF-15;P < 0.0001)后增加到 0.7284。与单个蛋白质模型相比,所有多蛋白质模型在 C 统计量方面均显示出额外的改善(均 P < 0.0001)。我们确定了几种与全因死亡率增加相关的独立于既定风险因素的血浆蛋白。需要进一步研究这些蛋白质对长寿的潜在因果作用。此外,用于识别多种蛋白质的检查方法通过纳入几种可能的假阳性发现而表现出过度拟合的趋势。因此,使用此类方法的发现的重现性可能受到限制。