Dr Panjwani Center for Molecular Medicine and Drug Research, International Center for Chemical and Biological Sciences, University of Karachi, Karachi, Pakistan.
J Hum Genet. 2013 Sep;58(9):622-6. doi: 10.1038/jhg.2013.72. Epub 2013 Jul 11.
We sequenced the genome of a Pakistani male at 25.5x coverage using massively parallel sequencing technology. More than 90% of the sequence reads were mapped to the human reference genome. In subsequent analysis, we identified 3,224,311 single-nucleotide polymorphisms (SNPs), of which 388,532 (12% of the total SNPs) had not been previously recorded in single nucleotide polymorphism database (dbSNP) or the 1000 Genomes Project database. The 5991 non-synonymous coding variants were screened for deleterious or disease-associated SNPs. Analysis of genes with deleterious SNPs identified 'retinoic acid signaling' and 'regulation of transcription' as the enriched Gene Ontology terms. Scanning of non-synonymous SNPs against the OMIM revealed several disease and phenotype-associated variants in Pakistani genome. Comparative analysis with Indian genome sequence revealed >1.8 million shared SNPs; 32% of which were annotated in ~14,000 genes. Gene Ontology (GO) terms analysis of these genes identified 'response to jasmonic acid stimulus', 'aminoglycoside antibiotic metabolic process' and 'glycoside metabolic process' with considerable enrichment. A total of 59,558 of small indels (1-5 bp) and 16,063 large structural variations were found; 54% of which was novel. Substantial number of novel structural variations discovered in Pakistani genome enforced previous inferences that (a) structural variations are major type of variation in the genome and (b) compared with SNPs, they putatively exhibit equivalent or superior functional roles. This genome sequence information will be an important reference for population-wide genomics studies of ethnically diverse South Asian subcontinent.
我们使用大规模平行测序技术对一名巴基斯坦男性进行了 25.5x 覆盖度的基因组测序。超过 90%的测序reads 被映射到人类参考基因组上。在随后的分析中,我们鉴定了 3224311 个单核苷酸多态性(SNP),其中 388532 个(占总 SNP 的 12%)在单核苷酸多态性数据库(dbSNP)或 1000 基因组计划数据库中尚未记录。对 5991 个非同义编码变异进行了有害或疾病相关 SNP 的筛选。对具有有害 SNP 的基因进行分析,鉴定出“视黄酸信号”和“转录调控”为富集的基因本体论(GO)术语。对非同义 SNP 进行 OMIM 扫描,揭示了巴基斯坦基因组中与疾病和表型相关的多个变异。与印度基因组序列的比较分析显示,有超过 180 万个共享 SNP;其中 32%在约 14000 个基因中注释。对这些基因的 GO 术语分析鉴定出“对茉莉酸刺激的反应”、“氨基糖苷类抗生素代谢过程”和“糖苷代谢过程”具有相当程度的富集。总共发现了 59558 个小插入缺失(1-5bp)和 16063 个大结构变异,其中 54%是新的。在巴基斯坦基因组中发现了大量的新结构变异,这进一步证实了之前的推断,即(a)结构变异是基因组中主要的变异类型,(b)与 SNP 相比,它们可能具有同等或更优越的功能作用。这个基因组序列信息将成为南亚次大陆种族多样化人群基因组学研究的重要参考。
J Hum Genet. 2013-7-11
Sci Data. 2018-9-11
PLoS One. 2014-8-12
BMC Genomics. 2012-8-31
Nature. 2009-8-20
Sci Data. 2020-10-13
BMC Res Notes. 2019-7-31
Sci Data. 2018-9-11
BMC Genomics. 2015-3-12
J Biosci. 2015-3
Genet Res (Camb). 2014-12-16