Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
Am J Hum Genet. 2021 Apr 1;108(4):656-668. doi: 10.1016/j.ajhg.2021.03.012. Epub 2021 Mar 25.
Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.
遗传研究在代表性不足的人群中确定了不成比例的新型关联。然而,大多数遗传研究使用的是基因分型阵列和测序参考面板,这些方法最能捕捉到欧洲血统人群中最常见的变异。为了比较最适合代表性不足人群的数据生成策略,我们作为神经精神遗传学非洲人群-精神病学(NeuroGAP-精神病学)研究的一部分,对来自埃塞俄比亚、肯尼亚、南非和乌干达的 91 名个体进行了全基因组测序,覆盖度达到了很高的水平。我们使用了一种抽样方法,通过计算这些技术与深度全基因组测序数据中推断出的变体的一致性,来评估两种具有成本效益的数据生成策略,即全基因组关联研究阵列与低覆盖度测序的质量。我们表明,深度为≥4×的低覆盖度测序比我们研究的所有常用全基因组关联研究阵列更准确地捕获了所有频率的变体,而且成本相当。测序深度较低(0.5-1×)与常用的低密度全基因组关联研究阵列性能相当。低覆盖度测序对新型变异也很敏感;4×测序可检测到在高覆盖度非洲全基因组中发现的 45%的单倍体和 95%的常见变异。低覆盖度测序方法克服了常见基因分型阵列确定带来的问题,有效地识别了新型变异,特别是在代表性不足的人群中,并提供了以类似于传统方法的成本增强变异发现的机会。