具有高维表型的全基因组关联研究。

Genome-wide association studies with high-dimensional phenotypes.

作者信息

Marttinen Pekka, Gillberg Jussi, Havulinna Aki, Corander Jukka, Kaski Samuel

机构信息

Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, Aalto, Finland

出版信息

Stat Appl Genet Mol Biol. 2013 Aug;12(4):413-31. doi: 10.1515/sagmb-2012-0032.

DOI:10.1515/sagmb-2012-0032

PMID:23759510

Abstract

High-dimensional phenotypes hold promise for richer findings in association studies, but testing of several phenotype traits aggravates the grand challenge of association studies, that of multiple testing. Several methods have recently been proposed for testing jointly all traits in a high-dimensional vector of phenotypes, with prospect of increased power to detect small effects that would be missed if tested individually. However, the methods have rarely been compared to the extent of enabling assessment of their relative merits and setting up guidelines on which method to use, and how to use it. We compare the methods on simulated data and with a real metabolomics data set comprising 137 highly correlated variables and approximately 550,000 SNPs. Applying the methods to genome-wide data with hundreds of thousands of markers inevitably requires division of the problem into manageable parts facilitating parallel processing, parts corresponding to individual genetic variants, pathways, or genes, for example. Here we utilize a straightforward formulation according to which the genome is divided into blocks of nearby correlated genetic markers, tested jointly for association with the phenotypes. This formulation is computationally feasible, reduces the number of tests, and lets the methods take advantage of combining information over several correlated variables not only on the phenotype side, but also on the genotype side. Our experiments show that canonical correlation analysis has higher power than alternative methods, while remaining computationally tractable for routine use in the GWAS setting, provided the number of samples is sufficient compared to the numbers of phenotype and genotype variables tested. Sparse canonical correlation analysis and regression models with latent confounding factors show promising performance when the number of samples is small compared to the dimensionality of the data.

摘要

高维表型有望在关联研究中带来更丰富的发现，但对多个表型特征进行检验加剧了关联研究的重大挑战——多重检验问题。最近有人提出了几种方法，用于联合检验高维表型向量中的所有特征，有望提高检测微小效应的能力，而这些微小效应若单独检验则可能会被遗漏。然而，这些方法很少被全面比较，以评估它们的相对优点并制定关于使用哪种方法以及如何使用的指南。我们在模拟数据和一个包含137个高度相关变量和约55万个单核苷酸多态性（SNP）的真实代谢组学数据集上对这些方法进行了比较。将这些方法应用于具有数十万标记的全基因组数据时，不可避免地需要将问题分解为便于并行处理的可管理部分，例如对应于单个遗传变异、通路或基因的部分。在这里，我们采用一种简单的方法，即将基因组划分为附近相关遗传标记的块，并联合检验它们与表型的关联。这种方法在计算上是可行的，减少了检验次数，并且使这些方法不仅能够利用多个相关变量在表型方面的信息组合优势，还能利用基因型方面的信息组合优势。我们的实验表明，典型相关分析比其他方法具有更高的检验效能，同时在全基因组关联研究（GWAS）环境中进行常规使用时，在计算上仍然易于处理，前提是样本数量与所检验的表型和基因型变量数量相比足够多。当样本数量与数据维度相比很小时，稀疏典型相关分析和具有潜在混杂因素的回归模型表现出了良好的性能。

相似文献

Genome-wide association studies with high-dimensional phenotypes.

Stat Appl Genet Mol Biol. 2013 Aug;12(4):413-31. doi: 10.1515/sagmb-2012-0032.

Bivariate association analysis for quantitative traits using generalized estimation equation.

J Genet Genomics. 2009 Dec;36(12):733-43. doi: 10.1016/S1673-8527(08)60166-6.

Identification of association between disease and multiple markers via sparse partial least-squares regression.

Genet Epidemiol. 2011 Sep;35(6):479-86. doi: 10.1002/gepi.20596. Epub 2011 Jun 15.

Stability selection for genome-wide association.

Genet Epidemiol. 2011 Nov;35(7):722-8. doi: 10.1002/gepi.20623. Epub 2011 Aug 26.

A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies.

BMC Bioinformatics. 2018 Mar 27;19(1):106. doi: 10.1186/s12859-018-2054-0.

Semiparametric Allelic Tests for Mapping Multiple Phenotypes: Binomial Regression and Mahalanobis Distance.

Genet Epidemiol. 2015 Dec;39(8):635-50. doi: 10.1002/gepi.21930. Epub 2015 Oct 23.

A mixed two-stage method for detecting interactions in genomewide association studies.

J Theor Biol. 2010 Feb 21;262(4):576-83. doi: 10.1016/j.jtbi.2009.10.029. Epub 2009 Nov 6.

Alternative methods for H1 simulations in genome-wide association studies.

Hum Hered. 2012;73(2):95-104. doi: 10.1159/000336194. Epub 2012 Mar 28.

On optimal gene-based analysis of genome scans.

Genet Epidemiol. 2012 May;36(4):333-9. doi: 10.1002/gepi.21625. Epub 2012 Apr 16.

PSEA: Phenotype Set Enrichment Analysis--a new method for analysis of multiple phenotypes.

Genet Epidemiol. 2012 Apr;36(3):244-52. doi: 10.1002/gepi.21617.

引用本文的文献

Multivariate analysis of genome-wide data to identify potential pleiotropic genes for five major psychiatric disorders using MetaCCA.

J Affect Disord. 2019 Jan 1;242:234-243. doi: 10.1016/j.jad.2018.07.046. Epub 2018 Jul 17.

MARV: a tool for genome-wide multi-phenotype analysis of rare variants.

BMC Bioinformatics. 2017 Feb 16;18(1):110. doi: 10.1186/s12859-017-1530-2.

Integrative regression network for genomic association study.

BMC Med Genomics. 2016 Aug 12;9 Suppl 1(Suppl 1):31. doi: 10.1186/s12920-016-0192-7.

metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

Bioinformatics. 2016 Jul 1;32(13):1981-9. doi: 10.1093/bioinformatics/btw052. Epub 2016 Feb 19.

Approaches for the identification of genetic modifiers of nutrient dependent phenotypes: examples from folate.

Front Nutr. 2014 Jul 14;1:8. doi: 10.3389/fnut.2014.00008. eCollection 2014.

Regularized machine learning in the genetic prediction of complex traits.

PLoS Genet. 2014 Nov 13;10(11):e1004754. doi: 10.1371/journal.pgen.1004754. eCollection 2014 Nov.

Penalized multimarker vs. single-marker regression methods for genome-wide association studies of quantitative traits.

Genetics. 2015 Jan;199(1):205-22. doi: 10.1534/genetics.114.167817. Epub 2014 Oct 28.

Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression.

Bioinformatics. 2014 Jul 15;30(14):2026-34. doi: 10.1093/bioinformatics/btu140. Epub 2014 Mar 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

具有高维表型的全基因组关联研究。

Genome-wide association studies with high-dimensional phenotypes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献