GenOmics, Bioinformatics, and Translational Research Center, RTI International, Research Triangle Park, NC, USA.
Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA.
Commun Biol. 2022 Aug 11;5(1):806. doi: 10.1038/s42003-022-03738-6.
Genome-wide association studies (GWAS) have made impactful discoveries for complex diseases, often by amassing very large sample sizes. Yet, GWAS of many diseases remain underpowered, especially for non-European ancestries. One cost-effective approach to increase sample size is to combine existing cohorts, which may have limited sample size or be case-only, with public controls, but this approach is limited by the need for a large overlap in variants across genotyping arrays and the scarcity of non-European controls. We developed and validated a protocol, Genotyping Array-WGS Merge (GAWMerge), for combining genotypes from arrays and whole-genome sequencing, ensuring complete variant overlap, and allowing for diverse samples like Trans-Omics for Precision Medicine to be used. Our protocol involves phasing, imputation, and filtering. We illustrated its ability to control technology driven artifacts and type-I error, as well as recover known disease-associated signals across technologies, independent datasets, and ancestries in smoking-related cohorts. GAWMerge enables genetic studies to leverage existing cohorts to validly increase sample size and enhance discovery for understudied traits and ancestries.
全基因组关联研究(GWAS)为复杂疾病做出了重大发现,通常是通过积累非常大的样本量。然而,许多疾病的 GWAS 仍然没有足够的效力,特别是对于非欧洲血统的人群。一种增加样本量的经济有效的方法是将现有的队列(可能样本量有限或只有病例)与公共对照进行合并,但这种方法受到基因分型数组中变体大量重叠的需求以及非欧洲对照稀缺的限制。我们开发并验证了一种名为“基因分型数组-WGS 合并(GAWMerge)”的方案,用于合并数组和全基因组测序的基因型,确保完全重叠的变体,并允许使用像 Trans-Omics for Precision Medicine 这样多样化的样本。我们的方案涉及相位、内插和过滤。我们展示了它控制技术驱动的伪影和 I 型错误的能力,以及在吸烟相关队列中跨技术、独立数据集和血统恢复已知与疾病相关的信号的能力。GAWMerge 使遗传研究能够利用现有队列来有效增加样本量,并增强对研究较少的特征和血统的发现。