Suppr超能文献

千人基因组计划数据集中的群体分层与印度次大陆遗传多样性代表性不足

Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset.

作者信息

Sengupta Dhriti, Choudhury Ananyo, Basu Analabha, Ramsay Michèle

机构信息

Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.

National Institute of Biomedical Genomics, Kalyani, India

出版信息

Genome Biol Evol. 2016 Dec 31;8(11):3460-3470. doi: 10.1093/gbe/evw244.

Abstract

Genomic variation in Indian populations is of great interest due to the diversity of ancestral components, social stratification, endogamy and complex admixture patterns. With an expanding population of 1.2 billion, India is also a treasure trove to catalogue innocuous as well as clinically relevant rare mutations. Recent studies have revealed four dominant ancestries in populations from mainland India: Ancestral North-Indian (ANI), Ancestral South-Indian (ASI), Ancestral Tibeto-Burman (ATB) and Ancestral Austro-Asiatic (AAA). The 1000 Genomes Project (KGP) Phase-3 data include about 500 genomes from five linguistically defined Indian-Subcontinent (IS) populations (Punjabi, Gujrati, Bengali, Telugu and Tamil) some of whom are recent migrants to USA or UK. Comparative analyses show that despite the distinct geographic origins of the KGP-IS populations, the ANI component is predominantly represented in this dataset. Previous studies demonstrated population substructure in the HapMap Gujrati population, and we found evidence for additional substructure in the Punjabi and Telugu populations. These substructured populations have characteristic/significant differences in heterozygosity and inbreeding coefficients. Moreover, we demonstrate that the substructure is better explained by factors like differences in proportion of ancestral components, and endogamy driven social structure rather than invoking a novel ancestral component to explain it. Therefore, using language and/or geography as a proxy for an ethnic unit is inadequate for many of the IS populations. This highlights the necessity for more nuanced sampling strategies or corrective statistical approaches, particularly for biomedical and population genetics research in India.

摘要

由于祖先成分的多样性、社会分层、近亲结婚以及复杂的混合模式,印度人群的基因组变异备受关注。印度拥有12亿不断增长的人口,也是一个记录无害以及临床相关罕见突变的宝库。最近的研究揭示了印度大陆人群中的四种主要祖先血统:北印度祖先(ANI)、南印度祖先(ASI)、藏缅祖先(ATB)和澳亚祖先(AAA)。千人基因组计划(KGP)第三阶段的数据包括来自五个语言定义的印度次大陆(IS)人群(旁遮普人、古吉拉特人、孟加拉人、泰卢固人和泰米尔人)的约500个基因组,其中一些人是最近移民到美国或英国的。比较分析表明,尽管KGP-IS人群的地理起源不同,但该数据集中主要代表的是ANI成分。先前的研究表明哈普Map古吉拉特人群存在群体亚结构,我们在旁遮普人和泰卢固人群中也发现了额外亚结构的证据。这些亚结构群体在杂合性和近亲繁殖系数方面具有特征性/显著差异。此外,我们证明,亚结构可以更好地用祖先成分比例差异和近亲结婚驱动的社会结构等因素来解释,而不是引入一个新的祖先成分来解释。因此,对于许多IS人群来说,用语言和/或地理作为种族单位的代理是不够的。这凸显了采用更细致入微的抽样策略或校正统计方法的必要性,特别是在印度的生物医学和群体遗传学研究中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2caa/5203783/f44fcf6484ca/evw244f1p.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验