一种资源高效的工具，用于大规模数据的混合模型关联分析。

A resource-efficient tool for mixed model association analysis of large-scale data.

机构信息

Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia.

Institute for Advanced Research, Wenzhou Medical University, Wenzhou, Zhejiang, China.

出版信息

Nat Genet. 2019 Dec;51(12):1749-1755. doi: 10.1038/s41588-019-0530-8. Epub 2019 Nov 25.

DOI:10.1038/s41588-019-0530-8

PMID:31768069

Abstract

The genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test statistics and hence to spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we develop an MLM-based tool (fastGWA) that controls for population stratification by principal components and for relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrate by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then apply fastGWA to 2,173 traits on array-genotyped and imputed samples from 456,422 individuals and to 2,048 traits on whole-exome-sequenced samples from 46,191 individuals in the UKB.

摘要

全基因组关联研究（GWAS）已被广泛用作实验设计，以检测遗传变异与表型之间的关联。两个主要的混杂因素，群体分层和相关性，可能导致 GWAS 检验统计量膨胀，从而导致虚假关联。基于混合线性模型（MLM）的方法可用于解释样本结构。然而，英国生物库（UKB）等生物库样本中的全基因组关联（GWA）分析通常超出了大多数现有基于 MLM 的工具的能力，特别是如果性状数量很大。在这里，我们开发了一种基于 MLM 的工具（fastGWA），该工具通过主成分控制群体分层，通过稀疏遗传关系矩阵控制相关性，用于生物库规模数据的 GWA 分析。我们通过广泛的模拟证明，fastGWA 是可靠的、鲁棒的和高度资源高效的。然后，我们将 fastGWA 应用于 UKB 中 456422 名个体的数组基因分型和导入样本中的 2173 个性状，以及 46191 名个体的全外显子测序样本中的 2048 个性状。

相似文献

A resource-efficient tool for mixed model association analysis of large-scale data.一种资源高效的工具，用于大规模数据的混合模型关联分析。

Nat Genet. 2019 Dec;51(12):1749-1755. doi: 10.1038/s41588-019-0530-8. Epub 2019 Nov 25.

A generalized linear mixed model association tool for biobank-scale data.一种用于生物样本库规模数据的广义线性混合模型关联工具。

Nat Genet. 2021 Nov;53(11):1616-1621. doi: 10.1038/s41588-021-00954-4. Epub 2021 Nov 4.

Whole-exome imputation within UK Biobank powers rare coding variant association and fine-mapping analyses.全外显子组推断在英国生物库中实现罕见编码变异关联和精细定位分析。

Nat Genet. 2021 Aug;53(8):1260-1269. doi: 10.1038/s41588-021-00892-1. Epub 2021 Jul 5.

A fast and powerful linear mixed model approach for genotype-environment interaction tests in large-scale GWAS.一种用于大规模全基因组关联研究中基因型-环境相互作用测试的快速且强大的线性混合模型方法。

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac547.

Efficient identification of trait-associated loss-of-function variants in the UK Biobank cohort by exome-sequencing based genotype imputation.通过基于外显子组测序的基因型插补在英国生物银行队列中有效鉴定与性状相关的功能丧失变异。

Genet Epidemiol. 2023 Mar;47(2):121-134. doi: 10.1002/gepi.22511. Epub 2022 Dec 9.

Fine-scale population structure in the UK Biobank: implications for genome-wide association studies.英国生物银行中的精细尺度人群结构：对全基因组关联研究的影响。

Hum Mol Genet. 2020 Sep 29;29(16):2803-2811. doi: 10.1093/hmg/ddaa157.

UK Biobank Whole-Exome Sequence Binary Phenome Analysis with Robust Region-Based Rare-Variant Test.英国生物银行全外显子组序列双表型分析与稳健基于区域的罕见变异测试。

Am J Hum Genet. 2020 Jan 2;106(1):3-12. doi: 10.1016/j.ajhg.2019.11.012. Epub 2019 Dec 19.

Linkage Disequilibrium and Evaluation of Genome-Wide Association Mapping Models in Tetraploid Potato.四倍体马铃薯的连锁不平衡及全基因组关联作图模型评估

G3 (Bethesda). 2018 Oct 3;8(10):3185-3202. doi: 10.1534/g3.118.200377.

Scalable mixed model methods for set-based association studies on large-scale categorical data analysis and its application to exome-sequencing data in UK Biobank.基于大规模分类数据分析的基于集合的关联研究的可扩展混合模型方法及其在英国生物库外显子组测序数据中的应用。

Am J Hum Genet. 2023 May 4;110(5):762-773. doi: 10.1016/j.ajhg.2023.03.010. Epub 2023 Apr 4.

A scalable variational inference approach for increased mixed-model association power.一种用于提高混合模型关联能力的可扩展变分推理方法。

Nat Genet. 2025 Feb;57(2):461-468. doi: 10.1038/s41588-024-02044-7. Epub 2025 Jan 9.

引用本文的文献

A phenome-wide association and Mendelian randomization study for suicide attempt within UK Biobank.英国生物银行中自杀未遂的全表型组关联和孟德尔随机化研究。

Mol Psychiatry. 2025 Sep 2. doi: 10.1038/s41380-025-03214-7.

Genetically proxied blood pressure, vascular brain injury, and Alzheimer's disease pathology.基因代理血压、血管性脑损伤和阿尔茨海默病病理学。

Alzheimers Dement. 2025 Jul;21(7):e70515. doi: 10.1002/alz.70515.

Multi-organ AI Endophenotypes Chart the Heterogeneity of Pan-disease in the Brain, Eye, and Heart.多器官人工智能内表型描绘大脑、眼睛和心脏中泛疾病的异质性。

medRxiv. 2025 Aug 13:2025.08.09.25333350. doi: 10.1101/2025.08.09.25333350.

Epigenetic age acceleration and midlife cognition: joint evidence from observational study and Mendelian randomization.表观遗传年龄加速与中年认知：来自观察性研究和孟德尔随机化的联合证据

NPJ Aging. 2025 Aug 18;11(1):75. doi: 10.1038/s41514-025-00265-6.

Evaluating the Causal Effects of ADHD and Autism on Cardiovascular Diseases and Vice Versa: A Systematic Review and Meta-Analysis of Mendelian Randomization Studies.评估注意力缺陷多动障碍（ADHD）和自闭症对心血管疾病的因果影响以及反之亦然：孟德尔随机化研究的系统评价和荟萃分析

Cells. 2025 Jul 31;14(15):1180. doi: 10.3390/cells14151180.

Genome-wide analyses reveal intricate genetic mechanisms underlying egg production efficiency in chickens.全基因组分析揭示了鸡产蛋效率背后复杂的遗传机制。

J Anim Sci Biotechnol. 2025 Aug 11;16(1):114. doi: 10.1186/s40104-025-01245-2.

LDAK-KVIK performs fast and powerful mixed-model association analysis of quantitative and binary phenotypes.LDAK-KVIK对定量和二元表型进行快速且强大的混合模型关联分析。

Nat Genet. 2025 Aug 11. doi: 10.1038/s41588-025-02286-z.

Non-coding genetic elements of lung cancer identified using whole genome sequencing in 13,722 Chinese.在中国13722例患者中通过全基因组测序鉴定出的肺癌非编码遗传元件

Nat Commun. 2025 Aug 9;16(1):7365. doi: 10.1038/s41467-025-62459-6.

Charting structural brain asymmetry across the human lifespan.绘制人类一生中大脑结构的不对称性

bioRxiv. 2025 Jul 24:2025.07.21.665924. doi: 10.1101/2025.07.21.665924.

Large-scale genome-wide analyses with proteomics integration reveal novel loci and biological insights into frailty.结合蛋白质组学的大规模全基因组分析揭示了与身体虚弱相关的新基因座和生物学见解。

Nat Aging. 2025 Aug;5(8):1589-1600. doi: 10.1038/s43587-025-00925-y. Epub 2025 Aug 5.

本文引用的文献

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.NHGRI-EBI GWAS Catalog 于 2019 年发布的已发表全基因组关联研究、靶向基因芯片和汇总统计数据

Nucleic Acids Res. 2019 Jan 8;47(D1):D1005-D1012. doi: 10.1093/nar/gky1120.

An atlas of genetic associations in UK Biobank.英国生物银行中的遗传关联图谱

Nat Genet. 2018 Nov;50(11):1593-1599. doi: 10.1038/s41588-018-0248-z. Epub 2018 Oct 22.

The UK Biobank resource with deep phenotyping and genomic data.英国生物银行资源库，具有深度表型和基因组数据。

Nature. 2018 Oct;562(7726):203-209. doi: 10.1038/s41586-018-0579-z. Epub 2018 Oct 10.

Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects.基因组测序分析管道的功能等效性使得人类遗传学项目中的变异调用得以协调。

Nat Commun. 2018 Oct 2;9(1):4038. doi: 10.1038/s41467-018-06159-4.

Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies.在大规模的遗传关联研究中，有效地控制病例-对照不平衡和样本相关性。

Nat Genet. 2018 Sep;50(9):1335-1341. doi: 10.1038/s41588-018-0184-y. Epub 2018 Aug 13.

Mixed-model association for biobank-scale datasets.基于生物库规模数据集的混合模型关联分析。

Nat Genet. 2018 Jul;50(7):906-908. doi: 10.1038/s41588-018-0144-6.

10 Years of GWAS Discovery: Biology, Function, and Translation.全基因组关联研究十年发现：生物学、功能与转化

Am J Hum Genet. 2017 Jul 6;101(1):5-22. doi: 10.1016/j.ajhg.2017.06.005.

Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data.利用全基因组测序数据量化全基因组关联研究的定位精度

Genome Biol. 2017 May 16;18(1):86. doi: 10.1186/s13059-017-1216-0.

FlashPCA2: principal component analysis of Biobank-scale genotype datasets.FlashPCA2：生物样本库规模基因型数据集的主成分分析

Bioinformatics. 2017 Sep 1;33(17):2776-2778. doi: 10.1093/bioinformatics/btx299.

Phenome-wide heritability analysis of the UK Biobank.英国生物银行的全表型组遗传力分析。

PLoS Genet. 2017 Apr 7;13(4):e1006711. doi: 10.1371/journal.pgen.1006711. eCollection 2017 Apr.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种资源高效的工具，用于大规模数据的混合模型关联分析。

A resource-efficient tool for mixed model association analysis of large-scale data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献