Suppr超能文献

G策略:基因关联研究中测序个体的最优选择

G-STRATEGY: Optimal Selection of Individuals for Sequencing in Genetic Association Studies.

作者信息

Wang Miaoyan, Jakobsdottir Johanna, Smith Albert V, McPeek Mary Sara

机构信息

Department of Statistics, University of Chicago, Chicago, Illinois, United States of America.

Icelandic Heart Association, Kopavogur, Iceland.

出版信息

Genet Epidemiol. 2016 Sep;40(6):446-60. doi: 10.1002/gepi.21982. Epub 2016 Jun 3.

Abstract

In a large-scale genetic association study, the number of phenotyped individuals available for sequencing may, in some cases, be greater than the study's sequencing budget will allow. In that case, it can be important to prioritize individuals for sequencing in a way that optimizes power for association with the trait. Suppose a cohort of phenotyped individuals is available, with some subset of them possibly already sequenced, and one wants to choose an additional fixed-size subset of individuals to sequence in such a way that the power to detect association is maximized. When the phenotyped sample includes related individuals, power for association can be gained by including partial information, such as phenotype data of ungenotyped relatives, in the analysis, and this should be taken into account when assessing whom to sequence. We propose G-STRATEGY, which uses simulated annealing to choose a subset of individuals for sequencing that maximizes the expected power for association. In simulations, G-STRATEGY performs extremely well for a range of complex disease models and outperforms other strategies with, in many cases, relative power increases of 20-40% over the next best strategy, while maintaining correct type 1 error. G-STRATEGY is computationally feasible even for large datasets and complex pedigrees. We apply G-STRATEGY to data on high-density lipoprotein and low-density lipoprotein from the AGES-Reykjavik and REFINE-Reykjavik studies, in which G-STRATEGY is able to closely approximate the power of sequencing the full sample by selecting for sequencing a only small subset of the individuals.

摘要

在一项大规模基因关联研究中,在某些情况下,可供测序的已表型分型个体数量可能会超过研究的测序预算所能承受的范围。在这种情况下,以一种优化与该性状关联检测效能的方式对个体进行测序优先级排序就很重要。假设有一组已表型分型的个体,其中一些子集可能已经测序,并且有人想选择另外一个固定大小的个体子集进行测序,以使检测关联的效能最大化。当已表型分型的样本包含亲属个体时,通过在分析中纳入部分信息(如未基因分型亲属的表型数据)可以提高关联检测效能,并且在评估对哪些个体进行测序时应考虑到这一点。我们提出了G-STRATEGY方法,它使用模拟退火算法来选择一个个体子集进行测序,以使关联的预期效能最大化。在模拟中,对于一系列复杂疾病模型,G-STRATEGY方法表现极为出色,在许多情况下,其效能相对于次优方法提高了20%-40%,同时保持了正确的I型错误率。即使对于大型数据集和复杂家系,G-STRATEGY方法在计算上也是可行的。我们将G-STRATEGY方法应用于AGES-雷克雅未克研究和REFINE-雷克雅未克研究中的高密度脂蛋白和低密度脂蛋白数据,在这些研究中,G-STRATEGY方法能够通过仅选择一小部分个体进行测序,来非常接近对整个样本进行测序的效能。

相似文献

1
G-STRATEGY: Optimal Selection of Individuals for Sequencing in Genetic Association Studies.
Genet Epidemiol. 2016 Sep;40(6):446-60. doi: 10.1002/gepi.21982. Epub 2016 Jun 3.
5
MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals.
Am J Hum Genet. 2013 May 2;92(5):652-66. doi: 10.1016/j.ajhg.2013.03.014.
6
Estimating the power of variance component linkage analysis in large pedigrees.
Genet Epidemiol. 2006 Sep;30(6):471-84. doi: 10.1002/gepi.20160.
8
Phenotypically Enriched Genotypic Imputation in Genetic Association Tests.
Hum Hered. 2016;81(1):35-45. doi: 10.1159/000446986. Epub 2016 Aug 31.
10
Association testing for next-generation sequencing data using score statistics.
Genet Epidemiol. 2012 Jul;36(5):430-7. doi: 10.1002/gepi.21636. Epub 2012 May 8.

本文引用的文献

1
Genetic linkage analysis in the age of whole-genome sequencing.
Nat Rev Genet. 2015 May;16(5):275-84. doi: 10.1038/nrg3908. Epub 2015 Mar 31.
2
Rare-variant association analysis: study designs and statistical tests.
Am J Hum Genet. 2014 Jul 3;95(1):5-23. doi: 10.1016/j.ajhg.2014.06.009.
3
Statistical power and significance testing in large-scale genetic studies.
Nat Rev Genet. 2014 May;15(5):335-46. doi: 10.1038/nrg3706.
4
A statistical framework to guide sequencing choices in pedigrees.
Am J Hum Genet. 2014 Feb 6;94(2):257-67. doi: 10.1016/j.ajhg.2014.01.005.
7
MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals.
Am J Hum Genet. 2013 May 2;92(5):652-66. doi: 10.1016/j.ajhg.2013.03.014.
8
Cancer pharmacogenomics: strategies and challenges.
Nat Rev Genet. 2013 Jan;14(1):23-34. doi: 10.1038/nrg3352. Epub 2012 Nov 27.
9
Utilizing graph theory to select the largest set of unrelated individuals for genetic analysis.
Genet Epidemiol. 2013 Feb;37(2):136-41. doi: 10.1002/gepi.21684. Epub 2012 Sep 19.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验