Malhotra Seema, Singh Sayar, Sarkar Soma
Defence Institute of Physiology and Allied Sciences (DIPAS), Defence Research and Development Organization, Ministry of Defence, Government of India, Lucknow Road, Delhi, 110054, India.
Genes Genomics. 2018 May;40(5):497-510. doi: 10.1007/s13258-018-0650-z. Epub 2018 Jan 24.
India represents an amazing confluence of geographically, linguistically and socially disparate ethnic populations (Indian Genome Variation Consortium, J Genet 87:3-20, 2008). Understanding the genetic diversity of Indian population remains a daunting task. In this paper we present detailed analysis of genomic variations (high-depth coverage (~ 30×) using Illumina Hiseq 2000 platform) from three healthy Indian male individuals each belonging to three geographically delineated regions and linguistic phylum viz. high altitude region of Ladakh (Tibeto-Burman linguistic phylum), sub mountainous region of Kumaun (Indo-European linguistic phylum) and sea level region of Telangana (Dravidian linguistic phylum) for probing the extent of genetic diversity in our population. The sequencing analysis provided high quality data (~ 95% of the total reads aligned to the human reference genome for each sample) and very good alignment quality (> 80% of the filtered mapped reads had a quality score of 60). A total of 4.3, 3.7 and 4.3 million single nucleotide variations were identified in the genome of high altitude, sub mountainous and sea level respectively by comparing with human reference genome. Approximately 17.3, 18.2, 17.4% of the variants were unique in the three genomes. The study identified many novel variations in the three diverse genomes (132,970 in Ladakh, 112,317 in Kumaun and 128,881 in Telangana individual) and is an important resource for creating a baseline and a comprehensive catalogue of human genomic variation across the Indian as well as the Asian continent.
印度呈现出地理、语言和社会层面截然不同的种族群体令人惊叹的融合(印度基因组变异联盟,《遗传学杂志》87:3 - 20,2008年)。了解印度人群的遗传多样性仍然是一项艰巨的任务。在本文中,我们对来自三个健康印度男性个体的基因组变异进行了详细分析(使用Illumina Hiseq 2000平台进行高深度覆盖(约30×)),这三个个体分别属于三个地理上划分的地区和语言语系,即拉达克的高海拔地区(藏缅语系)、库马盎的亚山区(印欧语系)和特伦甘纳的海平面地区(达罗毗荼语系),以探究我们人群中的遗传多样性程度。测序分析提供了高质量的数据(每个样本约95%的总读数与人类参考基因组比对)以及非常好的比对质量(>80%的过滤后映射读数的质量得分达到60)。通过与人类参考基因组比较,在高海拔、亚山区和海平面个体的基因组中分别鉴定出总共430万、370万和430万个单核苷酸变异。在这三个基因组中,分别约有17.3%、18.2%、17.4%的变异是独特的。该研究在这三个不同的基因组中鉴定出许多新的变异(拉达克个体中有132,970个,库马盎个体中有112,317个,特伦甘纳个体中有128,881个),并且是创建整个印度以及亚洲大陆人类基因组变异基线和综合目录的重要资源。