一种强大而有效的遗传标记集测试方法，可处理混杂因素。

A powerful and efficient set test for genetic markers that handles confounders.

机构信息

eScience Group, Microsoft Research, Los Angeles, CA 90024, USA.

出版信息

Bioinformatics. 2013 Jun 15;29(12):1526-33. doi: 10.1093/bioinformatics/btt177. Epub 2013 Apr 18.

DOI:10.1093/bioinformatics/btt177

PMID:23599503

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3673214/

Abstract

MOTIVATION

Approaches for testing sets of variants, such as a set of rare or common variants within a gene or pathway, for association with complex traits are important. In particular, set tests allow for aggregation of weak signal within a set, can capture interplay among variants and reduce the burden of multiple hypothesis testing. Until now, these approaches did not address confounding by family relatedness and population structure, a problem that is becoming more important as larger datasets are used to increase power.

RESULTS

We introduce a new approach for set tests that handles confounders. Our model is based on the linear mixed model and uses two random effects-one to capture the set association signal and one to capture confounders. We also introduce a computational speedup for two random-effects models that makes this approach feasible even for extremely large cohorts. Using this model with both the likelihood ratio test and score test, we find that the former yields more power while controlling type I error. Application of our approach to richly structured Genetic Analysis Workshop 14 data demonstrates that our method successfully corrects for population structure and family relatedness, whereas application of our method to a 15 000 individual Crohn's disease case-control cohort demonstrates that it additionally recovers genes not recoverable by univariate analysis.

AVAILABILITY

A Python-based library implementing our approach is available at http://mscompbio.codeplex.com.

摘要

动机

对于测试变体集合（例如基因或途径内的一组罕见或常见变体）与复杂性状的关联的方法非常重要。特别是，集合检验允许在集合内聚集弱信号，可以捕捉变体之间的相互作用，并减少多重假设检验的负担。到目前为止，这些方法并没有解决由家族相关性和群体结构引起的混杂问题，随着使用更大的数据集来提高功效，这个问题变得越来越重要。

结果

我们引入了一种新的集合检验方法来处理混杂因素。我们的模型基于线性混合模型，并使用两个随机效应-一个用于捕获集合关联信号，一个用于捕获混杂因素。我们还引入了一种针对两个随机效应模型的计算加速方法，即使对于非常大的队列，该方法也具有可行性。使用该模型进行似然比检验和得分检验，我们发现前者在控制 I 型错误的同时获得了更高的功效。将我们的方法应用于结构丰富的遗传分析研讨会 14 数据表明，我们的方法成功地纠正了群体结构和家族相关性，而将我们的方法应用于 15000 名个体克罗恩病病例对照队列表明，它还可以恢复单变量分析无法恢复的基因。

可用性

我们的方法的基于 Python 的库可在 http://mscompbio.codeplex.com 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1eff/3673214/bd99a6175814/btt177f1p.jpg

相似文献

A powerful and efficient set test for genetic markers that handles confounders.一种强大而有效的遗传标记集测试方法，可处理混杂因素。

Bioinformatics. 2013 Jun 15;29(12):1526-33. doi: 10.1093/bioinformatics/btt177. Epub 2013 Apr 18.

Greater power and computational efficiency for kernel-based association testing of sets of genetic variants.基于核的遗传变异集关联测试的更大的能力和计算效率。

Bioinformatics. 2014 Nov 15;30(22):3206-14. doi: 10.1093/bioinformatics/btu504. Epub 2014 Jul 29.

Powerful Tests for Multi-Marker Association Analysis Using Ensemble Learning.使用集成学习进行多标记关联分析的强大测试

PLoS One. 2015 Nov 30;10(11):e0143489. doi: 10.1371/journal.pone.0143489. eCollection 2015.

RL-SKAT: An Exact and Efficient Score Test for Heritability and Set Tests.RL-SKAT：一种用于遗传力和集合检验的精确且高效的评分检验。

Genetics. 2017 Dec;207(4):1275-1283. doi: 10.1534/genetics.117.300395. Epub 2017 Oct 12.

Integrate multiple traits to detect novel trait-gene association using GWAS summary data with an adaptive test approach.利用 GWAS 汇总数据和自适应检验方法整合多种性状，以检测新的性状-基因关联。

Bioinformatics. 2019 Jul 1;35(13):2251-2257. doi: 10.1093/bioinformatics/bty961.

Unified tests for fine-scale mapping and identifying sparse high-dimensional sequence associations.用于精细定位和识别稀疏高维序列关联的统一测试。

Bioinformatics. 2016 Feb 1;32(3):330-7. doi: 10.1093/bioinformatics/btv586. Epub 2015 Oct 12.

Gene, region and pathway level analyses in whole-genome studies.全基因组研究中的基因、区域和通路水平分析。

Genet Epidemiol. 2010 Apr;34(3):222-231. doi: 10.1002/gepi.20452.

A knowledge-based method for association studies on complex diseases.基于知识的复杂疾病关联研究方法。

PLoS One. 2012;7(9):e44162. doi: 10.1371/journal.pone.0044162. Epub 2012 Sep 6.

Efficient set tests for the genetic analysis of correlated traits.高效集检验在相关性状遗传分析中的应用。

Nat Methods. 2015 Aug;12(8):755-8. doi: 10.1038/nmeth.3439. Epub 2015 Jun 15.

Pathway analysis comparison using Crohn's disease genome wide association studies.基于克罗恩病全基因组关联研究的通路分析比较。

BMC Med Genomics. 2010 Jun 28;3:25. doi: 10.1186/1755-8794-3-25.

引用本文的文献

Fast kernel-based association testing of non-linear genetic effects for biobank-scale data.基于核的快速关联测试在生物库规模数据中非线性遗传效应。

Nat Commun. 2023 Aug 15;14(1):4936. doi: 10.1038/s41467-023-40346-2.

networkGWAS: a network-based approach to discover genetic associations.网络 GWAS：一种基于网络的方法，用于发现遗传关联。

Bioinformatics. 2023 Jun 1;39(6). doi: 10.1093/bioinformatics/btad370.

Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS.利用组学数据剖析复杂性状：线性混合模型及其在全基因组关联研究中的应用综述

Plants (Basel). 2022 Nov 28;11(23):3277. doi: 10.3390/plants11233277.

LDAK-GBAT: Fast and powerful gene-based association testing using summary statistics.LDAK-GBAT：使用汇总统计信息进行快速而强大的基于基因的关联测试。

Am J Hum Genet. 2023 Jan 5;110(1):23-29. doi: 10.1016/j.ajhg.2022.11.010. Epub 2022 Dec 7.

Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes.在 190,000 个外显子组中，利用基于功能信息核的测试，识别可解释的基因 - 生物标志物关联。

Nat Commun. 2022 Sep 10;13(1):5332. doi: 10.1038/s41467-022-32864-2.

Genome-Wide Association Studies Reveal Susceptibility Loci for Noninfectious Claw Lesions in Holstein Dairy Cattle.全基因组关联研究揭示了荷斯坦奶牛非感染性蹄爪病变的易感基因座。

Front Genet. 2021 May 28;12:657375. doi: 10.3389/fgene.2021.657375. eCollection 2021.

Gene-level quantitative trait mapping in Caenorhabditis elegans.线虫基因水平数量性状定位。

G3 (Bethesda). 2021 Feb 9;11(2). doi: 10.1093/g3journal/jkaa061.

Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure.基于模型的多因素降维方法在通过控制群体结构进行上位性检测中的性能

BioData Min. 2021 Feb 19;14(1):16. doi: 10.1186/s13040-021-00247-w.

RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method.基于单倍型的全基因组关联研究，使用一种新的 SNP 集方法。

PLoS Comput Biol. 2020 Feb 14;16(2):e1007663. doi: 10.1371/journal.pcbi.1007663. eCollection 2020 Feb.

Gene-set association and epistatic analyses reveal complex gene interaction networks affecting flowering time in a worldwide barley collection.基于基因集关联和上位性分析的方法，揭示了在一个全球大麦群体中影响开花时间的复杂基因互作网络。

J Exp Bot. 2019 Oct 24;70(20):5603-5616. doi: 10.1093/jxb/erz332.

本文引用的文献

The benefits of selecting phenotype-specific variants for applications of mixed models in genomics.选择表型特异性变体用于基因组学中混合模型应用的优势。

Sci Rep. 2013;3:1815. doi: 10.1038/srep01815.

FaST-LMM-Select for addressing confounding from spatial structure and rare variants.用于解决空间结构和罕见变异混杂问题的快速线性混合模型选择方法（FaST-LMM-Select）。

Nat Genet. 2013 May;45(5):470-1. doi: 10.1038/ng.2620.

An exhaustive epistatic SNP association analysis on expanded Wellcome Trust data.对扩展的惠康信托数据进行详尽的上位 SNP 关联分析。

Sci Rep. 2013;3:1099. doi: 10.1038/srep01099. Epub 2013 Jan 22.

Patterns of methylation heritability in a genome-wide analysis of four brain regions.四个大脑区域全基因组分析中甲基化遗传模式。

Nucleic Acids Res. 2013 Feb 1;41(4):2095-104. doi: 10.1093/nar/gks1449. Epub 2013 Jan 8.

Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies.最优统一方法用于罕见变异关联测试及其在小样本病例对照全外显子测序研究中的应用。

Am J Hum Genet. 2012 Aug 10;91(2):224-37. doi: 10.1016/j.ajhg.2012.06.007. Epub 2012 Aug 2.

An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations.一种在结构群体中进行全基因组关联研究的高效多基因混合模型方法。

Nat Genet. 2012 Jun 17;44(7):825-30. doi: 10.1038/ng.2314.

Optimal tests for rare variant effects in sequencing association studies.测序关联研究中罕见变异效应的最优检验。

Biostatistics. 2012 Sep;13(4):762-75. doi: 10.1093/biostatistics/kxs014. Epub 2012 Jun 14.

Improved linear mixed models for genome-wide association studies.用于全基因组关联研究的改进线性混合模型。

Nat Methods. 2012 May 30;9(6):525-6. doi: 10.1038/nmeth.2037.

The UCSC Genome Browser database: extensions and updates 2011.UCSC 基因组浏览器数据库：扩展和更新 2011 年版。

Nucleic Acids Res. 2012 Jan;40(Database issue):D918-23. doi: 10.1093/nar/gkr1055. Epub 2011 Nov 15.

FaST linear mixed models for genome-wide association studies.Fast 线性混合模型在全基因组关联研究中的应用。

Nat Methods. 2011 Sep 4;8(10):833-5. doi: 10.1038/nmeth.1681.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种强大而有效的遗传标记集测试方法，可处理混杂因素。

A powerful and efficient set test for genetic markers that handles confounders.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献