Prakrithi P, Singhal Khushboo, Sharma Disha, Jain Abhinav, Bhoyar Rahul C, Imran Mohamed, Senthilvel Vigneshwar, Divakar Mohit Kumar, Mishra Anushree, Scaria Vinod, Sivasubbu Sridhar, Mukerji Mitali
CSIR Institute of Genomics and Integrative Biology, Mathura Road, New Delhi 110025, India.
NAR Genom Bioinform. 2022 Feb 15;4(1):lqac009. doi: 10.1093/nargab/lqac009. eCollection 2022 Mar.
Actively retrotransposing primate-specific repeats display insertion-deletion (InDel) polymorphism through their insertion at new loci. In the global datasets, Indian populations remain under-represented and so do their InDels. Here, we report the genomic landscape of InDels from the recently released 1021 Indian Genomes (IndiGen) (available at https://clingen.igib.res.in/indigen). We identified 9239 polymorphic insertions that include private (3831), rare (3974) and common (1434) insertions with an average of 770 insertions per individual. We achieved an 89% PCR validation of the predicted genotypes in 94 samples tested. About 60% of identified InDels are unique to IndiGen when compared to other global datasets; 23% of sites were shared with both SGDP and HGSVC; among these, 58% (1289 sites) were common polymorphisms in IndiGen. The insertions not only show a bias for genic regions, with a preference for introns but also for the associated genes showing enrichment for processes like cell morphogenesis and neurogenesis (-value < 0.05). Approximately, 60% of InDels mapped to genes present in the OMIM database. Finally, we show that 558 InDels can serve as ancestry informative markers to segregate global populations. This study provides a valuable resource for baseline InDels that would be useful in population genomics.
活跃反转录的灵长类特异性重复序列通过插入新位点表现出插入缺失(InDel)多态性。在全球数据集中,印度人群的代表性仍然不足,其InDel也是如此。在这里,我们报告了来自最近发布的1021个印度基因组(IndiGen)(可在https://clingen.igib.res.in/indigen获取)的InDel基因组图谱。我们鉴定出9239个多态性插入,其中包括私有(3831个)、罕见(3974个)和常见(1434个)插入,平均每个个体有770个插入。在测试的94个样本中,我们对预测基因型的PCR验证率达到了89%。与其他全球数据集相比,约60%的已鉴定InDel是IndiGen特有的;23%的位点与SGDP和HGSVC共享;其中,58%(1289个位点)是IndiGen中的常见多态性。这些插入不仅显示出对基因区域的偏好,更倾向于内含子,而且对于相关基因,在细胞形态发生和神经发生等过程中表现出富集(-值<0.05)。大约60%的InDel映射到OMIM数据库中的基因。最后,我们表明558个InDel可作为祖先信息标记来区分全球人群。这项研究为基线InDel提供了宝贵资源,这将在群体基因组学中有用。