Sun Quan, Liu Weifang, Rosen Jonathan D, Huang Le, Pace Rhonda G, Dang Hong, Gallins Paul J, Blue Elizabeth E, Ling Hua, Corvol Harriet, Strug Lisa J, Bamshad Michael J, Gibson Ronald L, Pugh Elizabeth W, Blackman Scott M, Cutting Garry R, O'Neal Wanda K, Zhou Yi-Hui, Wright Fred A, Knowles Michael R, Wen Jia, Li Yun
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
HGG Adv. 2022 Jan 11;3(2):100090. doi: 10.1016/j.xhgg.2022.100090. eCollection 2022 Apr 14.
Cystic fibrosis (CF) is a severe genetic disorder that can cause multiple comorbidities affecting the lungs, the pancreas, the luminal digestive system and beyond. In our previous genome-wide association studies (GWAS), we genotyped approximately 8,000 CF samples using a mixture of different genotyping platforms. More recently, the Cystic Fibrosis Genome Project (CFGP) performed deep (approximately 30×) whole genome sequencing (WGS) of 5,095 samples to better understand the genetic mechanisms underlying clinical heterogeneity among patients with CF. For mixtures of GWAS array and WGS data, genotype imputation has proven effective in increasing effective sample size. Therefore, we first performed imputation for the approximately 8,000 CF samples with GWAS array genotype using the Trans-Omics for Precision Medicine (TOPMed) freeze 8 reference panel. Our results demonstrate that TOPMed can provide high-quality imputation for patients with CF, boosting genomic coverage from approximately 0.3-4.2 million genotyped markers to approximately 11-43 million well-imputed markers, and significantly improving polygenic risk score (PRS) prediction accuracy. Furthermore, we built a CF-specific CFGP reference panel based on WGS data of patients with CF. We demonstrate that despite having approximately 3% the sample size of TOPMed, our CFGP reference panel can still outperform TOPMed when imputing some CF disease-causing variants, likely owing to allele and haplotype differences between patients with CF and general populations. We anticipate our imputed data for 4,656 samples without WGS data will benefit our subsequent genetic association studies, and the CFGP reference panel built from CF WGS samples will benefit other investigators studying CF.
囊性纤维化(CF)是一种严重的遗传疾病,可导致多种合并症,影响肺部、胰腺、管腔消化系统及其他部位。在我们之前的全基因组关联研究(GWAS)中,我们使用多种不同的基因分型平台对约8000份CF样本进行了基因分型。最近,囊性纤维化基因组计划(CFGP)对5095份样本进行了深度(约30倍)全基因组测序(WGS),以更好地了解CF患者临床异质性的遗传机制。对于GWAS阵列数据和WGS数据的混合数据,基因型填充已被证明在增加有效样本量方面是有效的。因此,我们首先使用精准医学跨组学(TOPMed)冻结8参考面板对约8000份具有GWAS阵列基因型的CF样本进行填充。我们的结果表明,TOPMed可以为CF患者提供高质量的填充,将基因组覆盖范围从约30万至420万个基因分型标记提高到约1100万至4300万个填充良好的标记,并显著提高多基因风险评分(PRS)预测准确性。此外,我们基于CF患者的WGS数据构建了一个CF特异性CFGP参考面板。我们证明,尽管我们的CFGP参考面板样本量约为TOPMed的3%,但在填充一些CF致病变体时,其表现仍可优于TOPMed,这可能是由于CF患者与一般人群之间的等位基因和单倍型差异所致。我们预计,我们对4656份无WGS数据样本的填充数据将有利于我们后续的遗传关联研究,而由CF WGS样本构建的CFGP参考面板将有利于其他研究CF的人员。