Rashkin Sara, Jun Goo, Chen Sai, Abecasis Goncalo R
Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America.
Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, California, United States of America.
PLoS Genet. 2017 Jun 22;13(6):e1006811. doi: 10.1371/journal.pgen.1006811. eCollection 2017 Jun.
With the increasing focus of genetic association on the identification of trait-associated rare variants through sequencing, it is important to identify the most cost-effective sequencing strategies for these studies. Deep sequencing will accurately detect and genotype the most rare variants per individual, but may limit sample size. Low pass sequencing will miss some variants in each individual but has been shown to provide a cost-effective alternative for studies of common variants. Here, we investigate the impact of sequencing depth on studies of rare variants, focusing on singletons-the variants that are sampled in a single individual and are hardest to detect at low sequencing depths. We first estimate the sensitivity to detect singleton variants in both simulated data and in down-sampled deep genome and exome sequence data. We then explore the power of association studies comparing burden of singleton variants in cases and controls under a variety of conditions. We show that the power to detect singletons increases with coverage, typically plateauing for coverage > ~25x. Next, we show that, when total sequencing capacity is fixed, the power of association studies focused on singletons is typically maximized for coverage of 15-20x, independent of relative risk, disease prevalence, singleton burden, and case-control ratio. Our results suggest sequencing depth of 15-20x as an appropriate compromise of singleton detection power and sample size for studies of rare variants in complex disease.
随着基因关联研究越来越关注通过测序来识别与性状相关的罕见变异,为这些研究确定最具成本效益的测序策略至关重要。深度测序能够准确检测每个个体中最罕见的变异并进行基因分型,但可能会限制样本量。低通量测序会遗漏每个个体中的一些变异,但已被证明是研究常见变异的一种具有成本效益的替代方法。在此,我们研究测序深度对罕见变异研究的影响,重点关注单例变异——即在单个个体中被检测到且在低测序深度下最难检测的变异。我们首先在模拟数据以及下采样的深度基因组和外显子序列数据中估计检测单例变异的灵敏度。然后,我们探讨在各种条件下比较病例组和对照组中单例变异负担的关联研究的效能。我们表明,检测单例变异的效能随覆盖度增加,通常在覆盖度 > ~25x 时趋于平稳。接下来,我们表明,当总测序能力固定时,专注于单例变异的关联研究的效能通常在覆盖度为 15 - 20x 时达到最大化,与相对风险、疾病患病率、单例变异负担和病例对照比无关。我们的结果表明,对于复杂疾病中罕见变异的研究,15 - 20x 的测序深度是单例变异检测效能和样本量之间的适当折衷。