Zhang Yuting, Ghose Upamanyu, Buckley Noel J, Engelborghs Sebastiaan, Sleegers Kristel, Frisoni Giovanni B, Wallin Anders, Lleó Alberto, Popp Julius, Martinez-Lage Pablo, Legido-Quigley Cristina, Barkhof Frederik, Zetterberg Henrik, Visser Pieter Jelle, Bertram Lars, Lovestone Simon, Nevado-Holgado Alejo J, Shi Liu
Department of Psychiatry, University of Oxford, Oxford, United Kingdom.
Department of Biomedical Sciences, Reference Center for Biological Markers of Dementia (BIODEM), Institute Born-Bunge, University of Antwerp, Antwerp, Belgium.
Front Aging Neurosci. 2022 Nov 29;14:1040001. doi: 10.3389/fnagi.2022.1040001. eCollection 2022.
Blood-based biomarkers represent a promising approach to help identify early Alzheimer's disease (AD). Previous research has applied traditional machine learning (ML) to analyze plasma omics data and search for potential biomarkers, but the most modern ML methods based on deep learning has however been scarcely explored. In the current study, we aim to harness the power of state-of-the-art deep learning neural networks (NNs) to identify plasma proteins that predict amyloid, tau, and neurodegeneration (AT[N]) pathologies in AD.
We measured 3,635 proteins using SOMAscan in 881 participants from the European Medical Information Framework for AD Multimodal Biomarker Discovery study (EMIF-AD MBD). Participants underwent measurements of brain amyloid β (Aβ) burden, phosphorylated tau (p-tau) burden, and total tau (t-tau) burden to determine their AT(N) statuses. We ranked proteins by their association with Aβ, p-tau, t-tau, and AT(N), and fed the top 100 proteins along with age and apolipoprotein E () status into NN classifiers as input features to predict these four outcomes relevant to AD. We compared NN performance of using proteins, age, and genotype with performance of using age and status alone to identify protein panels that optimally improved the prediction over these main risk factors. Proteins that improved the prediction for each outcome were aggregated and nominated for pathway enrichment and protein-protein interaction enrichment analysis.
Age and alone predicted Aβ, p-tau, t-tau, and AT(N) burden with area under the curve (AUC) scores of 0.748, 0.662, 0.710, and 0.795. The addition of proteins significantly improved AUCs to 0.782, 0.674, 0.734, and 0.831, respectively. The identified proteins were enriched in five clusters of AD-associated pathways including human immunodeficiency virus 1 infection, p53 signaling pathway, and phosphoinositide-3-kinase-protein kinase B/Akt signaling pathway.
Combined with age and genotype, the proteins identified have the potential to serve as blood-based biomarkers for AD and await validation in future studies. While the NNs did not achieve better scores than the support vector machine model used in our previous study, their performances were likely limited by small sample size.
基于血液的生物标志物是一种很有前景的方法,有助于早期识别阿尔茨海默病(AD)。以往的研究应用传统机器学习(ML)分析血浆组学数据并寻找潜在的生物标志物,但基于深度学习的最现代ML方法却很少被探索。在本研究中,我们旨在利用最先进的深度学习神经网络(NNs)的能力,来识别可预测AD中淀粉样蛋白、tau蛋白和神经退行性变(AT[N])病理的血浆蛋白。
我们在来自欧洲医学信息框架AD多模态生物标志物发现研究(EMIF-AD MBD)的881名参与者中,使用SOMAscan检测了3635种蛋白质。参与者接受了脑淀粉样β(Aβ)负荷、磷酸化tau蛋白(p-tau)负荷和总tau蛋白(t-tau)负荷的测量,以确定他们的AT(N)状态。我们根据蛋白质与Aβ、p-tau、t-tau和AT(N)的关联对蛋白质进行排名,并将排名前100的蛋白质以及年龄和载脂蛋白E()状态作为输入特征输入到NN分类器中,以预测这四个与AD相关的结果。我们将使用蛋白质、年龄和基因型的NN性能与仅使用年龄和状态的性能进行比较,以识别能在这些主要风险因素基础上最佳改善预测的蛋白质组。汇总那些改善了每个结果预测的蛋白质,并提名进行通路富集和蛋白质-蛋白质相互作用富集分析。
单独的年龄和就能预测Aβ、p-tau、t-tau和AT(N)负荷,曲线下面积(AUC)得分分别为0.748、0.662、0.710和0.795。添加蛋白质后,AUC分别显著提高到0.782、0.674、0.734和0.831。所识别的蛋白质富集在五个与AD相关的通路簇中,包括人类免疫缺陷病毒1感染、p53信号通路和磷脂酰肌醇-3-激酶-蛋白激酶B/Akt信号通路。
与年龄和基因型相结合,所识别的蛋白质有潜力作为AD的基于血液的生物标志物,有待在未来研究中进行验证。虽然NNs没有比我们之前研究中使用的支持向量机模型获得更好的分数,但其性能可能受到样本量小的限制。