Fakhro Khalid A, Yousri Noha A, Rodriguez-Flores Juan L, Robay Amal, Staudt Michelle R, Agosto-Perez Francisco, Salit Jacqueline, Malek Joel A, Suhre Karsten, Jayyousi Amin, Zirie Mahmoud, Stadler Dora, Mezey Jason G, Crystal Ronald G
Department of Genetic Medicine, Weill Cornell Medical College in Qatar, Doha, Qatar.
Division of Translational Medicine, Sidra Medical Research Centre, Doha, Qatar.
BMC Genomics. 2015 Oct 22;16:834. doi: 10.1186/s12864-015-1991-5.
The populations of the Arabian Peninsula remain the least represented in public genetic databases, both in terms of single nucleotide variants and of larger genomic mutations. We present the first high-resolution copy number variation (CNV) map for a Gulf Arab population, using a hybrid approach that integrates array genotyping intensity data and next-generation sequencing reads to call CNVs in the Qatari population.
CNVs were detected in 97 unrelated Qatari individuals by running two calling algorithms on each of two primary datasets: high-resolution genotyping (Illumina Omni 2.5M) and high depth whole-genome sequencing (Illumina PE 100bp). The four call-sets were integrated to identify high confidence CNV regions, which were subsequently annotated for putative functional effect and compared to public databases of CNVs in other populations. The availability of genome sequence was leveraged to identify tagging SNPs in high LD with common deletions in this population, enabling their imputation from genotyping experiments in the future.
Genotyping intensities and genome sequencing data from 97 Qataris were analyzed with four different algorithms and integrated to discover 16,660 high confidence CNV regions (CNVRs) in the total population, affecting ~28 Mb in the median Qatari genome. Up to 40% of all CNVs affected genes, including novel CNVs affecting Mendelian disease genes, segregating at different frequencies in the 3 major Qatari subpopulations, including those with Bedouin, Persian/South Asian, and African ancestry. Consistent with high consanguinity levels in the Bedouin subpopulation, we found an increased burden for homozygous deletions in this group. In comparison to known CNVs in the comprehensive Database of Genomic Variants, we found that 5% of all CNVRs in Qataris were completely novel, with an enrichment of CNVs affecting several known chromosomal disorder loci and genes known to regulate sugar metabolism and type 2 diabetes in the Qatari cohort. Finally, we leveraged the availability of genome sequence to find suitable tagging SNPs for common deletions in this population.
We combine four independently generated datasets from 97 individuals to study CNVs for the first time at high-resolution in a Gulf Arab population.
无论是在单核苷酸变异还是较大的基因组突变方面,阿拉伯半岛人群在公共遗传数据库中的代表性仍然是最低的。我们利用一种整合了阵列基因分型强度数据和下一代测序读数的混合方法,在卡塔尔人群中检测拷贝数变异(CNV),从而呈现出首个针对海湾阿拉伯人群的高分辨率CNV图谱。
通过对两个主要数据集分别运行两种检测算法,在97名无亲缘关系的卡塔尔个体中检测CNV:高分辨率基因分型(Illumina Omni 2.5M)和高深度全基因组测序(Illumina PE 100bp)。整合这四个检测集以识别高可信度的CNV区域,随后对这些区域进行推定功能效应注释,并与其他人群的CNV公共数据库进行比较。利用基因组序列信息来识别与该人群中常见缺失处于高连锁不平衡状态的标签单核苷酸多态性(SNP),以便未来能从基因分型实验中对其进行推断。
使用四种不同算法对97名卡塔尔人的基因分型强度和基因组测序数据进行分析并整合,在总体人群中发现了16,660个高可信度的CNV区域(CNVR),在卡塔尔人的中位基因组中影响约28兆碱基。所有CNV中高达40%影响基因,包括影响孟德尔疾病基因的新型CNV,在三个主要的卡塔尔亚人群中以不同频率分离,这些亚人群包括有贝都因、波斯/南亚和非洲血统的人群。与贝都因亚人群中较高的近亲结婚率一致,我们发现该组中纯合缺失的负担增加。与基因组变异综合数据库中已知的CNV相比,我们发现卡塔尔人所有CNVR中有5%是全新的,在卡塔尔队列中,影响几个已知染色体疾病位点以及已知调节糖代谢和2型糖尿病的基因的CNV有所富集。最后,我们利用基因组序列信息为该人群中的常见缺失找到合适的标签SNP。
我们首次结合来自97名个体的四个独立生成的数据集,在海湾阿拉伯人群中以高分辨率研究CNV。