Alsmadi Osama, John Sumi E, Thareja Gaurav, Hebbar Prashantha, Antony Dinu, Behbehani Kazem, Thanaraj Thangavel Alphonse
Dasman Diabetes Institute, Dasman, Kuwait.
PLoS One. 2014 Jun 4;9(6):e99069. doi: 10.1371/journal.pone.0099069. eCollection 2014.
Population of the State of Kuwait is composed of three genetic subgroups of inferred Persian, Saudi Arabian tribe and Bedouin ancestry. The Saudi Arabian tribe subgroup traces its origin to the Najd region of Saudi Arabia. By sequencing two whole genomes and thirteen exomes from this subgroup at high coverage (>40X), we identify 4,950,724 Single Nucleotide Polymorphisms (SNPs), 515,802 indels and 39,762 structural variations. Of the identified variants, 10,098 (8.3%) exomic SNPs, 139,923 (2.9%) non-exomic SNPs, 5,256 (54.3%) exomic indels, and 374,959 (74.08%) non-exomic indels are 'novel'. Up to 8,070 (79.9%) of the reported novel biallelic exomic SNPs are seen in low frequency (minor allele frequency <5%). We observe 5,462 known and 1,004 novel potentially deleterious nonsynonymous SNPs. Allele frequencies of common SNPs from the 15 exomes is significantly correlated with those from genotype data of a larger cohort of 48 individuals (Pearson correlation coefficient, 0.91; p <2.2×10-16). A set of 2,485 SNPs show significantly different allele frequencies when compared to populations from other continents. Two notable variants having risk alleles in high frequencies in this subgroup are: a nonsynonymous deleterious SNP (rs2108622 [19:g.15990431C>T] from CYP4F2 gene [MIM:*604426]) associated with warfarin dosage levels [MIM:#122700] required to elicit normal anticoagulant response; and a 3' UTR SNP (rs6151429 [22:g.51063477T>C]) from ARSA gene [MIM:*607574]) associated with Metachromatic Leukodystrophy [MIM:#250100]. Hemoglobin Riyadh variant (identified for the first time in a Saudi Arabian woman) is observed in the exome data. The mitochondrial haplogroup profiles of the 15 individuals are consistent with the haplogroup diversity seen in Saudi Arabian natives, who are believed to have received substantial gene flow from Africa and eastern provenance. We present the first genome resource imperative for designing future genetic studies in Saudi Arabian tribe subgroup. The full-length genome sequences and the identified variants are available at ftp://dgr.dasmaninstitute.org and http://dgr.dasmaninstitute.org/DGR/gb.html.
科威特国人口由三个推断具有波斯、沙特阿拉伯部落和贝都因血统的基因亚群组成。沙特阿拉伯部落亚群的起源可追溯到沙特阿拉伯的内志地区。通过对该亚群的两个全基因组和13个外显子组进行高覆盖度(>40X)测序,我们鉴定出4,950,724个单核苷酸多态性(SNP)、515,802个插入缺失以及39,762个结构变异。在鉴定出的变异中,10,098个(8.3%)外显子组SNP、139,923个(2.9%)非外显子组SNP、5,256个(54.3%)外显子组插入缺失以及374,959个(74.08%)非外显子组插入缺失是“新的”。在报告的新的双等位基因外显子组SNP中,多达8,070个(79.9%)以低频出现(次要等位基因频率<5%)。我们观察到5,462个已知的和1,004个新的潜在有害非同义SNP。来自15个外显子组的常见SNP的等位基因频率与来自48个个体的更大队列的基因型数据中的等位基因频率显著相关(皮尔逊相关系数,0.91;p<2.2×10-16)。与来自其他大陆的人群相比,一组2,485个SNP显示出显著不同的等位基因频率。在该亚群中具有高频风险等位基因的两个显著变异是:一个与华法林剂量水平[MIM:#122700]相关的非同义有害SNP(rs2108622 [19:g.15990431C>T],来自CYP4F2基因[MIM:*604426]),以引发正常抗凝反应;以及一个与异染性脑白质营养不良[MIM:#250100]相关的来自ARSA基因[MIM:*607574]的3'UTR SNP(rs6151429 [22:g.51063477T>C])。在该外显子组数据中观察到血红蛋白利雅得变异(首次在一名沙特阿拉伯女性中鉴定出)。15名个体的线粒体单倍型图谱与沙特阿拉伯本地人中观察到的单倍型多样性一致,据信沙特阿拉伯本地人从非洲和东部来源接受了大量基因流动。我们提供了首个基因组资源,这对于设计沙特阿拉伯部落亚群未来的基因研究至关重要。全长基因组序列和鉴定出的变异可在ftp://dgr.dasmaninstitute.org和http://dgr.dasmaninstitute.org/DGR/gb.html上获取。