• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从统计建模到机器学习重新审视全基因组关联研究。

Revisiting genome-wide association studies from statistical modelling to machine learning.

机构信息

Institute of Fundamental and Frontier Sciences at the University of Electronic Science and Technology of China, Chengdu, China.

College of Computer Science and Engineering, Northeast Forestry University, Harbin, China.

出版信息

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa263.

DOI:10.1093/bib/bbaa263
PMID:33126243
Abstract

Over the last decade, genome-wide association studies (GWAS) have discovered thousands of genetic variants underlying complex human diseases and agriculturally important traits. These findings have been utilized to dissect the biological basis of diseases, to develop new drugs, to advance precision medicine and to boost breeding. However, the potential of GWAS is still underexploited due to methodological limitations. Many challenges have emerged, including detecting epistasis and single-nucleotide polymorphisms (SNPs) with small effects and distinguishing causal variants from other SNPs associated through linkage disequilibrium. These issues have motivated advancements in GWAS analyses in two contrasting cultures-statistical modelling and machine learning. In this review, we systematically present the basic concepts and the benefits and limitations in both methods. We further discuss recent efforts to mitigate their weaknesses. Additionally, we summarize the state-of-the-art tools for detecting the missed signals, ultrarare mutations and gene-gene interactions and for prioritizing SNPs. Our work can offer both theoretical and practical guidelines for performing GWAS analyses and for developing further new robust methods to fully exploit the potential of GWAS.

摘要

在过去的十年中,全基因组关联研究(GWAS)发现了数千个与复杂人类疾病和农业重要性状相关的遗传变异。这些发现被用于剖析疾病的生物学基础,开发新药,推进精准医学,以及促进育种。然而,由于方法学的限制,GWAS 的潜力仍未得到充分利用。出现了许多挑战,包括检测上位性和具有小效应的单核苷酸多态性(SNPs),以及区分因果变异与通过连锁不平衡相关的其他 SNPs。这些问题促使 GWAS 分析在两种截然不同的文化中取得了进展——统计建模和机器学习。在这篇综述中,我们系统地介绍了这两种方法的基本概念、优势和局限性。我们进一步讨论了最近为减轻其弱点所做的努力。此外,我们总结了用于检测遗漏信号、超稀有突变和基因-基因相互作用以及优先考虑 SNPs 的最新工具。我们的工作可以为进行 GWAS 分析以及开发进一步的新稳健方法以充分利用 GWAS 的潜力提供理论和实践指导。

相似文献

1
Revisiting genome-wide association studies from statistical modelling to machine learning.从统计建模到机器学习重新审视全基因组关联研究。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa263.
2
Detecting genetic interactions in pathway-based genome-wide association studies.基于通路的全基因组关联研究中的遗传相互作用检测。
Genet Epidemiol. 2014 May;38(4):300-9. doi: 10.1002/gepi.21803. Epub 2014 Apr 9.
3
A whole-genome simulator capable of modeling high-order epistasis for complex disease.一种能够对复杂疾病进行高阶上位性建模的全基因组模拟器。
Genet Epidemiol. 2013 Nov;37(7):686-94. doi: 10.1002/gepi.21761. Epub 2013 Oct 1.
4
Performance of epistasis detection methods in semi-simulated GWAS.连锁不平衡检测方法在半模拟 GWAS 中的性能。
BMC Bioinformatics. 2018 Jun 18;19(1):231. doi: 10.1186/s12859-018-2229-8.
5
Detecting epistasis in human complex traits.检测人类复杂性状中的上位性。
Nat Rev Genet. 2014 Nov;15(11):722-33. doi: 10.1038/nrg3747. Epub 2014 Sep 9.
6
Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.使用基于质量的两阶段随机森林进行全基因组关联数据分类和单核苷酸多态性选择。
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2164-16-S2-S5. Epub 2015 Jan 21.
7
Testing gene-gene interactions in genome wide association studies.全基因组关联研究中的基因-基因相互作用检测。
Genet Epidemiol. 2014 Feb;38(2):123-34. doi: 10.1002/gepi.21786. Epub 2014 Jan 15.
8
Imperfect Linkage Disequilibrium Generates Phantom Epistasis (& Perils of Big Data).不完全连锁不平衡产生虚假的上位性(&大数据的危险)。
G3 (Bethesda). 2019 May 7;9(5):1429-1436. doi: 10.1534/g3.119.400101.
9
Selecting Closely-Linked SNPs Based on Local Epistatic Effects for Haplotype Construction Improves Power of Association Mapping.基于局部上位效应选择紧密连锁 SNPs 进行单倍型构建可提高关联作图的功效。
G3 (Bethesda). 2019 Dec 3;9(12):4115-4126. doi: 10.1534/g3.119.400451.
10
Genome-Wide Association Study Statistical Models: A Review.全基因组关联研究统计模型:综述。
Methods Mol Biol. 2022;2481:43-62. doi: 10.1007/978-1-0716-2237-7_4.

引用本文的文献

1
kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS.kGWASflow:一种基于 k-mer 的 GWAS 的模块化、灵活和可重复的 Snakemake 工作流程。
G3 (Bethesda). 2023 Dec 29;14(1). doi: 10.1093/g3journal/jkad246.
2
Are Ischemic Stroke and Alzheimer's Disease Genetically Consecutive Pathologies?缺血性中风和阿尔茨海默病是基因上连续的病理状态吗?
Biomedicines. 2023 Oct 8;11(10):2727. doi: 10.3390/biomedicines11102727.
3
The interaction between Epstein-Barr virus and multiple sclerosis genetic risk loci: insights into disease pathogenesis and therapeutic opportunities.
爱泼斯坦-巴尔病毒与多发性硬化症遗传风险位点之间的相互作用:对疾病发病机制及治疗机会的见解
Clin Transl Immunology. 2023 Jun 17;12(6):e1454. doi: 10.1002/cti2.1454. eCollection 2023.
4
Robust SNP-based prediction of rheumatoid arthritis through machine-learning-optimized polygenic risk score.通过机器学习优化的多基因风险评分实现类风湿关节炎的稳健 SNP 预测。
J Transl Med. 2023 Feb 7;21(1):92. doi: 10.1186/s12967-023-03939-5.
5
Prediction of Plant Resistance Proteins Based on Pairwise Energy Content and Stacking Framework.基于成对能量含量和堆积框架的植物抗性蛋白预测
Front Plant Sci. 2022 May 31;13:912599. doi: 10.3389/fpls.2022.912599. eCollection 2022.
6
Genome-Wide Association Study Statistical Models: A Review.全基因组关联研究统计模型:综述。
Methods Mol Biol. 2022;2481:43-62. doi: 10.1007/978-1-0716-2237-7_4.
7
Optimising Cardiometabolic Risk Factors in Pregnancy: A Review of Risk Prediction Models Targeting Gestational Diabetes and Hypertensive Disorders.优化孕期心脏代谢风险因素:针对妊娠期糖尿病和高血压疾病的风险预测模型综述
J Cardiovasc Dev Dis. 2022 Feb 10;9(2):55. doi: 10.3390/jcdd9020055.
8
Functional coding haplotypes and machine-learning feature elimination identifies predictors of Methotrexate Response in Rheumatoid Arthritis patients.功能性编码单倍型和机器学习特征消除鉴定类风湿关节炎患者甲氨蝶呤反应的预测因子。
EBioMedicine. 2022 Jan;75:103800. doi: 10.1016/j.ebiom.2021.103800. Epub 2022 Jan 10.
9
Labels in a haystack: Approaches beyond supervised learning in biomedical applications.大海捞针中的标签:生物医学应用中超越监督学习的方法。
Patterns (N Y). 2021 Dec 10;2(12):100383. doi: 10.1016/j.patter.2021.100383.
10
Genome-Wide Association Studies of Soybean Yield-Related Hyperspectral Reflectance Bands Using Machine Learning-Mediated Data Integration Methods.使用机器学习介导的数据整合方法对大豆产量相关高光谱反射波段进行全基因组关联研究。
Front Plant Sci. 2021 Nov 22;12:777028. doi: 10.3389/fpls.2021.777028. eCollection 2021.