基于数据驱动的家系全基因组关联研究权重方案。

A data-driven weighting scheme for family-based genome-wide association studies.

机构信息

Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, USA.

出版信息

Eur J Hum Genet. 2010 May;18(5):596-603. doi: 10.1038/ejhg.2009.201. Epub 2009 Nov 25.

DOI:10.1038/ejhg.2009.201

PMID:19935828

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2858789/

Abstract

Recently, Steen et al proposed a novel two-stage approach for family-based genome-wide association studies. In the first stage, a test based on between-family information is used to rank SNPs according to their P-values or conditional power of the test. In the second stage, the R most promising SNPs are tested using a family-based association test. We call this two-stage approach top R method. Ionita-Laza et al proposed an exponential weighting method within a two-stage framework. In the second stage of this approach, instead of testing top R SNPs, it tests all SNPs and weights the P-values of association test according to the information of the first stage. However, both of the top R and exponential weighting methods only use the information from the first stage to rank SNPs. It seems that the two methods do not use information from the first stage efficiently. Furthermore, it may be unreasonable for the exponential weighting method to use the same weight for all SNPs within a group when only one or a few SNPs are related with a disease. In this article, we propose a data-driven weighting scheme within a two-stage framework. In this method, we use the information from the first stage to determine a SNP-specific weight for each SNP. We use simulation studies to evaluate the performance of our method. The simulation results showed that our proposed method is consistently more powerful than the top R method and the exponential weighting method, regardless of the LD structure, population structure, and family structure.

摘要

最近，Steen 等人提出了一种新颖的基于家系的全基因组关联研究两阶段方法。在第一阶段，基于家系间信息的检验用于根据 P 值或检验的条件功效对 SNP 进行排序。在第二阶段，使用基于家系的关联检验测试最有前途的 R 个 SNP。我们称这种两阶段方法为 top R 方法。Ionita-Laza 等人在两阶段框架内提出了一种指数加权方法。在该方法的第二阶段，它不是测试 top R SNPs，而是测试所有 SNP，并根据第一阶段的信息对关联检验的 P 值进行加权。然而，top R 和指数加权方法都只使用第一阶段的信息来对 SNP 进行排序。这两种方法似乎没有有效地利用第一阶段的信息。此外，当只有一个或几个 SNP 与疾病相关时，指数加权方法对一组内的所有 SNP 使用相同的权重可能是不合理的。在本文中，我们在两阶段框架内提出了一种数据驱动的加权方案。在这种方法中，我们使用第一阶段的信息为每个 SNP 确定一个 SNP 特异性权重。我们使用模拟研究来评估我们方法的性能。模拟结果表明，无论 LD 结构、群体结构和家系结构如何，我们提出的方法始终比 top R 方法和指数加权方法更有效。

相似文献

A data-driven weighting scheme for family-based genome-wide association studies.基于数据驱动的家系全基因组关联研究权重方案。

Eur J Hum Genet. 2010 May;18(5):596-603. doi: 10.1038/ejhg.2009.201. Epub 2009 Nov 25.

Two-stage association tests for genome-wide association studies based on family data with arbitrary family structure.基于具有任意家庭结构的家庭数据的全基因组关联研究的两阶段关联测试。

Eur J Hum Genet. 2007 Nov;15(11):1169-75. doi: 10.1038/sj.ejhg.5201902. Epub 2007 Jul 25.

Pathway analysis of genome-wide data improves warfarin dose prediction.全基因组数据分析的途径分析提高了华法林剂量预测。

BMC Genomics. 2013;14 Suppl 3(Suppl 3):S11. doi: 10.1186/1471-2164-14-S3-S11. Epub 2013 May 28.

A three-stage approach for genome-wide association studies with family data for quantitative traits.一种用于数量性状家系数据全基因组关联研究的三阶段方法。

BMC Genet. 2010 May 14;11:40. doi: 10.1186/1471-2156-11-40.

Joint analysis for genome-wide association studies in family-based designs.基于家系设计的全基因组关联研究的联合分析。

PLoS One. 2011;6(7):e21957. doi: 10.1371/journal.pone.0021957. Epub 2011 Jul 22.

Two-stage testing strategies for genome-wide association studies in family-based designs.基于家系设计的全基因组关联研究的两阶段检验策略。

Methods Mol Biol. 2010;620:485-96. doi: 10.1007/978-1-60761-580-4_17.

Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies.未分型标记的全基因组推断准确性及其对关联研究统计效能的影响。

BMC Genet. 2009 Jun 16;10:27. doi: 10.1186/1471-2156-10-27.

Multi-locus test conditional on confirmed effects leads to increased power in genome-wide association studies.多基因座条件检验导致全基因组关联研究的功效增加。

PLoS One. 2010 Nov 16;5(11):e15006. doi: 10.1371/journal.pone.0015006.

A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL.一种用于全基因组关联研究中泛化测试的强大统计框架，并应用于西班牙裔社区健康研究/拉丁裔研究（HCHS/SOL）。

Genet Epidemiol. 2017 Apr;41(3):251-258. doi: 10.1002/gepi.22029. Epub 2017 Jan 15.

A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies.一种结合基于随机森林的技术和通过潜在变量进行连锁不平衡建模的方法，用于进行多基因座全基因组关联研究。

BMC Bioinformatics. 2018 Mar 27;19(1):106. doi: 10.1186/s12859-018-2054-0.

引用本文的文献

A Nonparametric Regression Approach to Control for Population Stratification in Rare Variant Association Studies.一种用于控制罕见变异关联研究中群体分层的非参数回归方法。

Sci Rep. 2016 Nov 18;6:37444. doi: 10.1038/srep37444.

On family-based genome-wide association studies with large pedigrees: observations and recommendations.关于基于家系的大型家系全基因组关联研究：观察结果与建议。

BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S26. doi: 10.1186/1753-6561-8-S1-S26. eCollection 2014.

Incorporating parental information into family-based association tests.将父母信息纳入基于家庭的关联测试中。

Biostatistics. 2013 Jul;14(3):556-72. doi: 10.1093/biostatistics/kxs048. Epub 2012 Dec 23.

Joint analysis for genome-wide association studies in family-based designs.基于家系设计的全基因组关联研究的联合分析。

PLoS One. 2011;6(7):e21957. doi: 10.1371/journal.pone.0021957. Epub 2011 Jul 22.

本文引用的文献

On the replication of genetic associations: timing can be everything!关于基因关联的复制：时机至关重要！

Am J Hum Genet. 2008 Apr;82(4):849-58. doi: 10.1016/j.ajhg.2008.01.018.

Genomewide weighted hypothesis testing in family-based association studies, with an application to a 100K scan.基于家系的关联研究中的全基因组加权假设检验及其在100K扫描中的应用。

Am J Hum Genet. 2007 Sep;81(3):607-14. doi: 10.1086/519748. Epub 2007 Jul 17.

Eur J Hum Genet. 2007 Nov;15(11):1169-75. doi: 10.1038/sj.ejhg.5201902. Epub 2007 Jul 25.

Improving power in genome-wide association studies: weights tip the scale.提高全基因组关联研究的效能：权重起到关键作用。

Genet Epidemiol. 2007 Nov;31(7):741-7. doi: 10.1002/gepi.20237.

Interpretation of simultaneous linkage and family-based association tests in genome screens.基因组筛查中同时进行连锁分析和基于家系的关联测试的解读。

Genet Epidemiol. 2007 Feb;31(2):134-42. doi: 10.1002/gepi.20196.

A fast method for computing high-significance disease association in large population-based studies.一种在大型基于人群的研究中计算高显著性疾病关联的快速方法。

Am J Hum Genet. 2006 Sep;79(3):481-92. doi: 10.1086/507317. Epub 2006 Jul 24.

A common genetic variant is associated with adult and childhood obesity.一种常见的基因变异与成人和儿童肥胖有关。

Science. 2006 Apr 14;312(5771):279-83. doi: 10.1126/science.1124779.

Genomic screening and replication using the same data set in family-based association testing.在基于家系的关联测试中使用相同数据集进行基因组筛查和复制。

Nat Genet. 2005 Jul;37(7):683-91. doi: 10.1038/ng1582. Epub 2005 Jun 5.

Genetic analysis of genome-wide variation in human gene expression.人类基因表达全基因组变异的遗传分析。

Nature. 2004 Aug 12;430(7001):743-7. doi: 10.1038/nature02797. Epub 2004 Jul 21.

Generating samples under a Wright-Fisher neutral model of genetic variation.在遗传变异的赖特-费希尔中性模型下生成样本。

Bioinformatics. 2002 Feb;18(2):337-8. doi: 10.1093/bioinformatics/18.2.337.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验