• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用基因组数据对多发性硬化症和阿尔茨海默病进行分类的机器学习方法

Machine Learning Methods for Classifying Multiple Sclerosis and Alzheimer's Disease Using Genomic Data.

作者信息

Arnal Segura Magdalena, Bini Giorgio, Krithara Anastasia, Paliouras Georgios, Tartaglia Gian Gaetano

机构信息

Centre for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen, 83, 16152 Genova, Italy.

Department of Biology 'Charles Darwin', Sapienza University of Rome, P.le A. Moro 5, 00185 Rome, Italy.

出版信息

Int J Mol Sci. 2025 Feb 27;26(5):2085. doi: 10.3390/ijms26052085.

DOI:10.3390/ijms26052085
PMID:40076709
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11900513/
Abstract

Complex diseases pose challenges in prediction due to their multifactorial and polygenic nature. This study employed machine learning (ML) to analyze genomic data from the UK Biobank, aiming to predict the genomic predisposition to complex diseases like multiple sclerosis (MS) and Alzheimer's disease (AD). We tested logistic regression (LR), ensemble tree methods, and deep learning models for this purpose. LR displayed remarkable stability across various subsets of data, outshining deep learning approaches, which showed greater variability in performance. Additionally, ML methods demonstrated an ability to maintain optimal performance despite correlated genomic features due to linkage disequilibrium. When comparing the performance of polygenic risk score (PRS) with ML methods, PRS consistently performed at an average level. By employing explainability tools in the ML models of MS, we found that the results confirmed the polygenicity of this disease. The highest-prioritized genomic variants in MS were identified as expression or splicing quantitative trait loci located in non-coding regions within or near genes associated with the immune response, with a prevalence of human leukocyte antigen (HLA) gene annotations. Our findings shed light on both the potential and the challenges of employing ML to capture complex genomic patterns, paving the way for improved predictive models.

摘要

复杂疾病因其多因素和多基因性质在预测方面面临挑战。本研究采用机器学习(ML)分析英国生物银行的基因组数据,旨在预测诸如多发性硬化症(MS)和阿尔茨海默病(AD)等复杂疾病的基因组易感性。为此,我们测试了逻辑回归(LR)、集成树方法和深度学习模型。LR在数据的各个子集上表现出显著的稳定性,优于深度学习方法,深度学习方法的性能表现出更大的可变性。此外,尽管由于连锁不平衡存在相关的基因组特征,ML方法仍显示出保持最佳性能的能力。当将多基因风险评分(PRS)与ML方法的性能进行比较时,PRS始终表现处于平均水平。通过在MS的ML模型中使用可解释性工具,我们发现结果证实了该疾病的多基因性。MS中优先级最高的基因组变异被确定为位于与免疫反应相关基因内部或附近非编码区域的表达或剪接数量性状位点,其中人类白细胞抗原(HLA)基因注释占比很高。我们的研究结果揭示了使用ML捕捉复杂基因组模式的潜力和挑战,为改进预测模型铺平了道路。

相似文献

1
Machine Learning Methods for Classifying Multiple Sclerosis and Alzheimer's Disease Using Genomic Data.使用基因组数据对多发性硬化症和阿尔茨海默病进行分类的机器学习方法
Int J Mol Sci. 2025 Feb 27;26(5):2085. doi: 10.3390/ijms26052085.
2
Epistatic Features and Machine Learning Improve Alzheimer's Disease Risk Prediction Over Polygenic Risk Scores.遗传交互作用特征和机器学习可提高阿尔茨海默病风险预测的准确性,优于多基因风险评分。
J Alzheimers Dis. 2024;99(4):1425-1440. doi: 10.3233/JAD-230236.
3
Improving the Utility of Polygenic Risk Scores as a Biomarker for Alzheimer's Disease.提高多基因风险评分作为阿尔茨海默病生物标志物的效用。
Cells. 2021 Jun 29;10(7):1627. doi: 10.3390/cells10071627.
4
Polygenic risk and hazard scores for Alzheimer's disease prediction.多基因风险和阿尔茨海默病预测的危害评分。
Ann Clin Transl Neurol. 2019 Feb 18;6(3):456-465. doi: 10.1002/acn3.716. eCollection 2019 Mar.
5
Haplotype function score improves biological interpretation and cross-ancestry polygenic prediction of human complex traits.单体型功能评分可改善人类复杂性状的生物学解释和跨血统多基因预测。
Elife. 2024 Apr 19;12:RP92574. doi: 10.7554/eLife.92574.
6
Polygenic risk score in postmortem diagnosed sporadic early-onset Alzheimer's disease.多基因风险评分在尸检诊断的散发性早发性阿尔茨海默病中的应用。
Neurobiol Aging. 2018 Feb;62:244.e1-244.e8. doi: 10.1016/j.neurobiolaging.2017.09.035. Epub 2017 Oct 10.
7
A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants.对用于基于基因组变异预测复杂人类疾病的统计和机器学习方法的全面研究。
Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac552.
8
Fast and scalable ensemble learning method for versatile polygenic risk prediction.快速且可扩展的集成学习方法,用于多功能多基因风险预测。
Proc Natl Acad Sci U S A. 2024 Aug 13;121(33):e2403210121. doi: 10.1073/pnas.2403210121. Epub 2024 Aug 7.
9
Assessing the presence of shared genetic architecture between Alzheimer's disease and major depressive disorder using genome-wide association data.利用全基因组关联数据评估阿尔茨海默病与重度抑郁症之间共享遗传结构的存在情况。
Transl Psychiatry. 2017 Apr 18;7(4):e1094. doi: 10.1038/tp.2017.49.
10
Progress in Polygenic Composite Scores in Alzheimer's and Other Complex Diseases.多基因复合评分在阿尔茨海默病和其他复杂疾病中的研究进展。
Trends Genet. 2019 May;35(5):371-382. doi: 10.1016/j.tig.2019.02.005. Epub 2019 Mar 25.

引用本文的文献

1
TFProtBert: Detection of Transcription Factors Binding to Methylated DNA Using ProtBert Latent Space Representation.TFProtBert:利用ProtBert潜在空间表示法检测与甲基化DNA结合的转录因子
Int J Mol Sci. 2025 Apr 29;26(9):4234. doi: 10.3390/ijms26094234.

本文引用的文献

1
Epistatic Features and Machine Learning Improve Alzheimer's Disease Risk Prediction Over Polygenic Risk Scores.遗传交互作用特征和机器学习可提高阿尔茨海默病风险预测的准确性,优于多基因风险评分。
J Alzheimers Dis. 2024;99(4):1425-1440. doi: 10.3233/JAD-230236.
2
Machine Learning Models of Polygenic Risk for Enhanced Prediction of Alzheimer Disease Endophenotypes.用于增强阿尔茨海默病内表型预测的多基因风险机器学习模型
Neurol Genet. 2024 Jan 10;10(1):e200120. doi: 10.1212/NXG.0000000000200120. eCollection 2024 Feb.
3
Classifying Alzheimer's disease and normal subjects using machine learning techniques and genetic-environmental features.
使用机器学习技术和遗传-环境特征对阿尔茨海默病和正常受试者进行分类。
J Formos Med Assoc. 2024 Jun;123(6):701-709. doi: 10.1016/j.jfma.2023.10.021. Epub 2023 Dec 2.
4
Explainable variational autoencoder (E-VAE) model using genome-wide SNPs to predict dementia.使用全基因组 SNPs 进行可解释的变分自动编码器 (E-VAE) 模型预测痴呆症。
J Biomed Inform. 2023 Dec;148:104536. doi: 10.1016/j.jbi.2023.104536. Epub 2023 Nov 4.
5
DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype-phenotype prediction.DeepGAMI:基于生物学的深度辅助学习的多模态整合与插补方法,以提高基因型-表型预测。
Genome Med. 2023 Oct 31;15(1):88. doi: 10.1186/s13073-023-01248-6.
6
c-Diadem: a constrained dual-input deep learning model to identify novel biomarkers in Alzheimer's disease.c-Diadem:一种受限双输入深度学习模型,用于识别阿尔茨海默病中的新型生物标志物。
BMC Med Genomics. 2023 Oct 13;16(Suppl 2):244. doi: 10.1186/s12920-023-01675-9.
7
Improving genetic risk prediction across diverse population by disentangling ancestry representations.通过解析祖先表示来改善不同人群的遗传风险预测。
Commun Biol. 2023 Sep 22;6(1):964. doi: 10.1038/s42003-023-05352-6.
8
Accurate proteome-wide missense variant effect prediction with AlphaMissense.使用 AlphaMissense 进行精确的全蛋白质错义变异效应预测。
Science. 2023 Sep 22;381(6664):eadg7492. doi: 10.1126/science.adg7492.
9
Classification and deep-learning-based prediction of Alzheimer disease subtypes by using genomic data.基于基因组数据的阿尔茨海默病亚型的分类和深度学习预测。
Transl Psychiatry. 2023 Jun 29;13(1):232. doi: 10.1038/s41398-023-02531-1.
10
A simple new approach to variable selection in regression, with application to genetic fine mapping.一种用于回归中变量选择的简单新方法及其在基因精细定位中的应用。
J R Stat Soc Series B Stat Methodol. 2020 Dec;82(5):1273-1300. doi: 10.1111/rssb.12388. Epub 2020 Jul 10.