• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

将机器学习应用于高维蛋白质组学数据集以鉴定阿尔茨海默病生物标志物。

Applying machine learning to high-dimensional proteomics datasets for the identification of Alzheimer's disease biomarkers.

作者信息

Ivarsson Orrelid Christoffer, Rosberg Oscar, Weiner Sophia, Johansson Fredrik D, Gobom Johan, Zetterberg Henrik, Mwai Newton, Stempfle Lena

机构信息

Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Rännvägen 6b, 41296, Gothenburg, Västra Götalandsregionen, Sweden.

Department of Psychiatry and Neurochemistry, The Sahlgrenska Academy at the University of Gothenburg, Wallinsgatan 6, 43141, Möndal, Västra Götalandsregionen, Sweden.

出版信息

Fluids Barriers CNS. 2025 Mar 3;22(1):23. doi: 10.1186/s12987-025-00634-z.

DOI:10.1186/s12987-025-00634-z
PMID:40033432
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11874791/
Abstract

PURPOSE

This study explores the application of machine learning to high-dimensional proteomics datasets for identifying Alzheimer's disease (AD) biomarkers. AD, a neurodegenerative disorder affecting millions worldwide, necessitates early and accurate diagnosis for effective management.

METHODS

We leverage Tandem Mass Tag (TMT) proteomics data from the cerebrospinal fluid (CSF) samples from the frontal cortex of patients with idiopathic normal pressure hydrocephalus (iNPH), a condition often comorbid with AD, with rare access to both lumbar and ventricular samples. Our methodology includes extensive data preprocessing to address batch effects and missing values, followed by the use of the Synthetic Minority Over-sampling Technique (SMOTE) for data augmentation to overcome the small sample size. We apply linear, and non-linear machine learning models, and ensemble methods, to compare iNPH patients with and without biomarker evidence of AD pathology ( or ) in a classification task.

RESULTS

We present a machine learning workflow for working with high-dimensional TMT proteomics data that addresses their inherent data characteristics. Our results demonstrate that batch effect correction has no or minor impact on the models' performance and robust feature selection is critical for model stability and performance, especially in the high-dimensional proteomics data setting for AD diagnostics. The results further indicated that removing features with missing values produced stronger models than imputing them, and the batch effect had minimal impact on the models Our best-performing disease-progression detection model, a random forest, achieves an AUC of 0.84 (± 0.03).

CONCLUSION

We identify several novel protein biomarkers candidates, such as FABP3 and GOT1, with potential diagnostic value for AD pathology detection, suggesting the necessity of different biomarkers for AD diagnoses for patients with iNPH, and considering different biomarkers for ventricular and lumbar CSF samples. This work underscores the importance of a meticulous machine learning process in enhancing biomarker discovery. Our study also provides insights in translating biomarkers from other central nervous system diseases like iNPH, and both ventricular and lumbar CSF samples for biomarker discovery, providing a foundation for future research and clinical applications.

摘要

目的

本研究探索机器学习在高维蛋白质组学数据集中的应用,以识别阿尔茨海默病(AD)生物标志物。AD是一种影响全球数百万人的神经退行性疾病,需要早期准确诊断以进行有效管理。

方法

我们利用来自特发性正常压力脑积水(iNPH)患者额叶皮质脑脊液(CSF)样本的串联质谱标签(TMT)蛋白质组学数据,iNPH常与AD共病,且很少能同时获取腰椎和脑室样本。我们的方法包括广泛的数据预处理以解决批次效应和缺失值问题,随后使用合成少数过采样技术(SMOTE)进行数据增强以克服样本量小的问题。我们应用线性和非线性机器学习模型以及集成方法,在分类任务中比较有和没有AD病理生物标志物证据(或)的iNPH患者。

结果

我们展示了一个用于处理高维TMT蛋白质组学数据的机器学习工作流程,该流程解决了其固有的数据特征。我们的结果表明,批次效应校正对模型性能没有或只有轻微影响,稳健的特征选择对于模型稳定性和性能至关重要,特别是在用于AD诊断的高维蛋白质组学数据设置中。结果进一步表明,去除有缺失值的特征比插补这些特征能产生更强的模型,且批次效应对模型的影响最小。我们表现最佳的疾病进展检测模型——随机森林,AUC达到0.84(±0.03)。

结论

我们确定了几种新型蛋白质生物标志物候选物,如FABP3和GOT1,它们对AD病理检测具有潜在诊断价值,这表明对于iNPH患者的AD诊断需要不同的生物标志物,并考虑脑室和腰椎CSF样本的不同生物标志物。这项工作强调了细致的机器学习过程在增强生物标志物发现方面的重要性。我们的研究还为从其他中枢神经系统疾病(如iNPH)以及脑室和腰椎CSF样本中转化生物标志物以进行生物标志物发现提供了见解,为未来的研究和临床应用奠定了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/62a3d60c92b4/12987_2025_634_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/41cf55781a64/12987_2025_634_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/3b4870bad36c/12987_2025_634_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/c8275766d450/12987_2025_634_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/2f3df5a740e3/12987_2025_634_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/1d5b43b70686/12987_2025_634_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/c4df2f2b7d0d/12987_2025_634_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/340276fbf579/12987_2025_634_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/62a3d60c92b4/12987_2025_634_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/41cf55781a64/12987_2025_634_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/3b4870bad36c/12987_2025_634_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/c8275766d450/12987_2025_634_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/2f3df5a740e3/12987_2025_634_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/1d5b43b70686/12987_2025_634_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/c4df2f2b7d0d/12987_2025_634_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/340276fbf579/12987_2025_634_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e61/11874791/62a3d60c92b4/12987_2025_634_Fig8_HTML.jpg

相似文献

1
Applying machine learning to high-dimensional proteomics datasets for the identification of Alzheimer's disease biomarkers.将机器学习应用于高维蛋白质组学数据集以鉴定阿尔茨海默病生物标志物。
Fluids Barriers CNS. 2025 Mar 3;22(1):23. doi: 10.1186/s12987-025-00634-z.
2
Unbiased CSF Proteomics in Patients With Idiopathic Normal Pressure Hydrocephalus to Identify Molecular Signatures and Candidate Biomarkers.特发性正常压力脑积水患者的无偏倚脑脊液蛋白质组学以识别分子特征和候选生物标志物
Neurology. 2025 Mar 11;104(5):e213375. doi: 10.1212/WNL.0000000000213375. Epub 2025 Feb 14.
3
Novel cerebrospinal fluid biomarkers correlating with shunt responsiveness in patients with idiopathic normal pressure hydrocephalus.与特发性正常压力脑积水患者分流反应相关的新型脑脊液生物标志物。
Fluids Barriers CNS. 2023 Jun 5;20(1):40. doi: 10.1186/s12987-023-00440-5.
4
Time Trends of Cerebrospinal Fluid Biomarkers of Neurodegeneration in Idiopathic Normal Pressure Hydrocephalus.特发性正常压力脑积水神经退行性生物标志物的脑脊液时间趋势。
J Alzheimers Dis. 2021;80(4):1629-1642. doi: 10.3233/JAD-201361.
5
Idiopathic normal pressure hydrocephalus has a different cerebrospinal fluid biomarker profile from Alzheimer's disease.特发性正常压力脑积水具有与阿尔茨海默病不同的脑脊液生物标志物谱。
J Alzheimers Dis. 2015;45(1):109-15. doi: 10.3233/JAD-142622.
6
Cerebrospinal Fluid Biomarkers in Idiopathic Normal Pressure Hydrocephalus versus Alzheimer's Disease and Subcortical Ischemic Vascular Disease: A Systematic Review.特发性正常压力脑积水与阿尔茨海默病及皮质下缺血性血管病的脑脊液生物标志物:一项系统评价
J Alzheimers Dis. 2019;68(1):267-279. doi: 10.3233/JAD-180816.
7
Graph Convolutional Network for AD and MCI Diagnosis Utilizing Peripheral DNA Methylation: Réseau de neurones en graphes pour le diagnostic de la MA et du TCL à l'aide de la méthylation de l'ADN périphérique.利用外周血DNA甲基化的阿尔茨海默病和轻度认知障碍诊断的图卷积网络:使用外周血DNA甲基化进行阿尔茨海默病和轻度认知障碍诊断的图神经网络
Can J Psychiatry. 2024 Dec;69(12):869-879. doi: 10.1177/07067437241300947. Epub 2024 Nov 25.
8
Cerebrospinal fluid biomarker and brain biopsy findings in idiopathic normal pressure hydrocephalus.特发性正常压力脑积水的脑脊液生物标志物和脑活检结果
PLoS One. 2014 Mar 17;9(3):e91974. doi: 10.1371/journal.pone.0091974. eCollection 2014.
9
Neuroinflammation and Alzheimer's Disease: A Machine Learning Approach to CSF Proteomics.神经炎症与阿尔茨海默病:脑脊液蛋白质组学的机器学习方法。
Cells. 2021 Jul 29;10(8):1930. doi: 10.3390/cells10081930.
10
Cerebrospinal Fluid Diagnostics of Alzheimer's Disease in Patients with Idiopathic Normal Pressure Hydrocephalus.特发性正常压力脑积水患者阿尔茨海默病的脑脊液诊断。
J Alzheimers Dis. 2023;94(2):727-736. doi: 10.3233/JAD-230144.

引用本文的文献

1
Progress and trends on machine learning in proteomics during 1997-2024: a bibliometric analysis.1997 - 2024年蛋白质组学中机器学习的进展与趋势:文献计量分析
Front Med (Lausanne). 2025 Aug 15;12:1594442. doi: 10.3389/fmed.2025.1594442. eCollection 2025.
2
A nine-gene signature with potential targets for predicting the prognosis of patients with esophageal cancer.一种具有潜在靶点的九基因特征,用于预测食管癌患者的预后。
Transl Cancer Res. 2025 Jul 30;14(7):4305-4320. doi: 10.21037/tcr-2025-146. Epub 2025 Jul 24.

本文引用的文献

1
Shunting for idiopathic normal pressure hydrocephalus.分流术治疗特发性正常压力脑积水。
Cochrane Database Syst Rev. 2024 Aug 6;8(8):CD014923. doi: 10.1002/14651858.CD014923.pub2.
2
Alzheimer's disease early diagnostic and staging biomarkers revealed by large-scale cerebrospinal fluid and serum proteomic profiling.通过大规模脑脊液和血清蛋白质组分析揭示的阿尔茨海默病早期诊断和分期生物标志物
Innovation (Camb). 2024 Jan 2;5(1):100544. doi: 10.1016/j.xinn.2023.100544. eCollection 2024 Jan 8.
3
pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods.
pyComBat,一个使用经验贝叶斯方法进行高通量分子数据批次效应校正的 Python 工具。
BMC Bioinformatics. 2023 Dec 7;24(1):459. doi: 10.1186/s12859-023-05578-5.
4
NPTX2 in Cerebrospinal Fluid Predicts the Progression From Normal Cognition to Mild Cognitive Impairment.脑脊液中的 NPTX2 可预测正常认知向轻度认知障碍的进展。
Ann Neurol. 2023 Oct;94(4):620-631. doi: 10.1002/ana.26725. Epub 2023 Jul 25.
5
How missing value imputation is confounded with batch effects and what you can do about it.缺失值插补如何与批次效应混淆,以及你可以采取哪些措施来解决。
Drug Discov Today. 2023 Sep;28(9):103661. doi: 10.1016/j.drudis.2023.103661. Epub 2023 Jun 9.
6
Novel cerebrospinal fluid biomarkers correlating with shunt responsiveness in patients with idiopathic normal pressure hydrocephalus.与特发性正常压力脑积水患者分流反应相关的新型脑脊液生物标志物。
Fluids Barriers CNS. 2023 Jun 5;20(1):40. doi: 10.1186/s12987-023-00440-5.
7
Exploiting machine learning models to identify novel Alzheimer's disease biomarkers and potential targets.利用机器学习模型识别新型阿尔茨海默病生物标志物和潜在靶点。
Sci Rep. 2023 Mar 27;13(1):4979. doi: 10.1038/s41598-023-30904-5.
8
2023 Alzheimer's disease facts and figures.2023 年阿尔茨海默病事实和数据。
Alzheimers Dement. 2023 Apr;19(4):1598-1695. doi: 10.1002/alz.13016. Epub 2023 Mar 14.
9
Hallmarks of neurodegenerative diseases.神经退行性疾病的特征。
Cell. 2023 Feb 16;186(4):693-714. doi: 10.1016/j.cell.2022.12.032.
10
Differential proteomic profile of lumbar and ventricular cerebrospinal fluid.腰椎和脑室脑脊液的差异蛋白质组学图谱。
Fluids Barriers CNS. 2023 Jan 21;20(1):6. doi: 10.1186/s12987-022-00405-0.