千人基因组计划数据集中的群体分层与印度次大陆遗传多样性代表性不足

Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset.

作者信息

Sengupta Dhriti, Choudhury Ananyo, Basu Analabha, Ramsay Michèle

机构信息

Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.

National Institute of Biomedical Genomics, Kalyani, India

出版信息

Genome Biol Evol. 2016 Dec 31;8(11):3460-3470. doi: 10.1093/gbe/evw244.

DOI:10.1093/gbe/evw244

PMID:27797945

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5203783/

Abstract

Genomic variation in Indian populations is of great interest due to the diversity of ancestral components, social stratification, endogamy and complex admixture patterns. With an expanding population of 1.2 billion, India is also a treasure trove to catalogue innocuous as well as clinically relevant rare mutations. Recent studies have revealed four dominant ancestries in populations from mainland India: Ancestral North-Indian (ANI), Ancestral South-Indian (ASI), Ancestral Tibeto-Burman (ATB) and Ancestral Austro-Asiatic (AAA). The 1000 Genomes Project (KGP) Phase-3 data include about 500 genomes from five linguistically defined Indian-Subcontinent (IS) populations (Punjabi, Gujrati, Bengali, Telugu and Tamil) some of whom are recent migrants to USA or UK. Comparative analyses show that despite the distinct geographic origins of the KGP-IS populations, the ANI component is predominantly represented in this dataset. Previous studies demonstrated population substructure in the HapMap Gujrati population, and we found evidence for additional substructure in the Punjabi and Telugu populations. These substructured populations have characteristic/significant differences in heterozygosity and inbreeding coefficients. Moreover, we demonstrate that the substructure is better explained by factors like differences in proportion of ancestral components, and endogamy driven social structure rather than invoking a novel ancestral component to explain it. Therefore, using language and/or geography as a proxy for an ethnic unit is inadequate for many of the IS populations. This highlights the necessity for more nuanced sampling strategies or corrective statistical approaches, particularly for biomedical and population genetics research in India.

摘要

由于祖先成分的多样性、社会分层、近亲结婚以及复杂的混合模式，印度人群的基因组变异备受关注。印度拥有12亿不断增长的人口，也是一个记录无害以及临床相关罕见突变的宝库。最近的研究揭示了印度大陆人群中的四种主要祖先血统：北印度祖先（ANI）、南印度祖先（ASI）、藏缅祖先（ATB）和澳亚祖先（AAA）。千人基因组计划（KGP）第三阶段的数据包括来自五个语言定义的印度次大陆（IS）人群（旁遮普人、古吉拉特人、孟加拉人、泰卢固人和泰米尔人）的约500个基因组，其中一些人是最近移民到美国或英国的。比较分析表明，尽管KGP-IS人群的地理起源不同，但该数据集中主要代表的是ANI成分。先前的研究表明哈普Map古吉拉特人群存在群体亚结构，我们在旁遮普人和泰卢固人群中也发现了额外亚结构的证据。这些亚结构群体在杂合性和近亲繁殖系数方面具有特征性/显著差异。此外，我们证明，亚结构可以更好地用祖先成分比例差异和近亲结婚驱动的社会结构等因素来解释，而不是引入一个新的祖先成分来解释。因此，对于许多IS人群来说，用语言和/或地理作为种族单位的代理是不够的。这凸显了采用更细致入微的抽样策略或校正统计方法的必要性，特别是在印度的生物医学和群体遗传学研究中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2caa/5203783/f44fcf6484ca/evw244f1p.jpg

相似文献

Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset.千人基因组计划数据集中的群体分层与印度次大陆遗传多样性代表性不足

Genome Biol Evol. 2016 Dec 31;8(11):3460-3470. doi: 10.1093/gbe/evw244.

Application of geographic population structure (GPS) algorithm for biogeographical analyses of populations with complex ancestries: a case study of South Asians from 1000 genomes project.地理种群结构（GPS）算法在复杂祖先群体生物地理分析中的应用：以千人基因组计划中的南亚人为例

BMC Genet. 2017 Dec 28;18(Suppl 1):109. doi: 10.1186/s12863-017-0579-2.

Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure.对印度现存人口历史的基因组重建揭示了五个不同的祖先成分和一个复杂的结构。

Proc Natl Acad Sci U S A. 2016 Feb 9;113(6):1594-9. doi: 10.1073/pnas.1513197113. Epub 2016 Jan 25.

A microsatellite study to disentangle the ambiguity of linguistic, geographic, ethnic and genetic influences on tribes of India to get a better clarity of the antiquity and peopling of South Asia.一项微卫星研究，旨在厘清语言、地理、种族和基因对印度部落的影响所存在的模糊性，以便更清晰地了解南亚的古代史和人口情况。

Am J Phys Anthropol. 2009 Aug;139(4):533-46. doi: 10.1002/ajpa.21018.

A prehistory of Indian Y chromosomes: evaluating demic diffusion scenarios.印度Y染色体的史前史：评估人口扩散情形

Proc Natl Acad Sci U S A. 2006 Jan 24;103(4):843-8. doi: 10.1073/pnas.0507714103. Epub 2006 Jan 13.

Integrating Linguistics, Social Structure, and Geography to Model Genetic Diversity within India.综合语言学、社会结构和地理学模型来模拟印度内部的遗传多样性。

Mol Biol Evol. 2021 May 4;38(5):1809-1819. doi: 10.1093/molbev/msaa321.

Reconstructing Indian population history.重构印度人口历史。

Nature. 2009 Sep 24;461(7263):489-94. doi: 10.1038/nature08365.

Genetic diversity and admixture patterns in Indian populations.印度人群的遗传多样性和混合模式。

Gene. 2012 Oct 25;508(2):250-5. doi: 10.1016/j.gene.2012.07.047. Epub 2012 Aug 8.

Variation at diabetes- and obesity-associated Loci may mirror neutral patterns of human population diversity and diabetes prevalence in India.糖尿病和肥胖相关基因座的变异可能反映了印度人群多样性的中性模式以及糖尿病患病率。

Ann Hum Genet. 2013 Sep;77(5):392-408. doi: 10.1111/ahg.12028. Epub 2013 Jul 1.

Genetic diversity in India and the inference of Eurasian population expansion.印度的遗传多样性与欧亚人群扩张的推断。

Genome Biol. 2010;11(11):R113. doi: 10.1186/gb-2010-11-11-r113. Epub 2010 Nov 24.

引用本文的文献

Precision nutrition across climates: decoding diet, tradition, and genomic adaptations from South Asia to the Arctic.跨气候条件下的精准营养：解读从南亚到北极地区的饮食、传统与基因适应性

Front Nutr. 2025 Aug 14;12:1638843. doi: 10.3389/fnut.2025.1638843. eCollection 2025.

Major Allele Frequencies in and in Asian and European Populations: A Case Study to Disaggregate Data Among Large Racial Categories.亚洲和欧洲人群中[具体内容]的主要等位基因频率：一项在大型种族类别中分解数据的案例研究。

J Pers Med. 2025 Jun 27;15(7):274. doi: 10.3390/jpm15070274.

Region-Based Analysis with Functional Annotation Identifies Genes Associated with Cognitive Function in South Asians from India.基于区域分析和功能注释识别印度南亚人群中与认知功能相关的基因。

Genes (Basel). 2025 May 27;16(6):640. doi: 10.3390/genes16060640.

High-throughput sequencing: a breakthrough in molecular diagnosis for precision medicine.高通量测序：精准医学分子诊断的一项突破。

Funct Integr Genomics. 2025 Jan 22;25(1):22. doi: 10.1007/s10142-025-01529-w.

Genetic and linguistic comparisons reveal complex sex-biased transmission of language features.遗传和语言比较揭示了语言特征复杂的性别偏向性传递。

Proc Natl Acad Sci U S A. 2024 Nov 26;121(48):e2322881121. doi: 10.1073/pnas.2322881121. Epub 2024 Nov 18.

Improving GWAS performance in underrepresented groups by appropriate modeling of genetics, environment, and sociocultural factors.通过对遗传、环境和社会文化因素进行适当建模，提高代表性不足群体中的全基因组关联研究（GWAS）性能。

bioRxiv. 2024 Oct 29:2024.10.28.620716. doi: 10.1101/2024.10.28.620716.

Pharmacogenetic analysis of structural variation in the 1000 genomes project using whole genome sequences.利用全基因组序列对 1000 基因组计划中的结构变异进行遗传药理学分析。

Sci Rep. 2024 Oct 1;14(1):22774. doi: 10.1038/s41598-024-73748-3.

Region-based analysis with functional annotation identifies genes associated with cognitive function in South Asians from India.基于区域的功能注释分析确定了来自印度的南亚人与认知功能相关的基因。

Res Sq. 2024 Aug 10:rs.3.rs-4712660. doi: 10.21203/rs.3.rs-4712660/v1.

Quantifying variations associated with dental caries reveals disparity in effect allele frequencies across diverse populations.量化与龋齿相关的变异可以揭示不同人群中效应等位基因频率的差异。

BMC Genom Data. 2024 Jun 3;25(1):50. doi: 10.1186/s12863-024-01215-z.

The Inclusion of Underrepresented Populations in Cardiovascular Genetics and Epidemiology.心血管遗传学和流行病学中代表性不足人群的纳入。

J Cardiovasc Dev Dis. 2024 Feb 5;11(2):56. doi: 10.3390/jcdd11020056.

本文引用的文献

ESTIMATING F-STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE.估计用于群体结构分析的F统计量

Evolution. 1984 Nov;38(6):1358-1370. doi: 10.1111/j.1558-5646.1984.tb05657.x.

Genomic analysis of Andamanese provides insights into ancient human migration into Asia and adaptation.安达曼岛人的基因组分析提供了有关古代人类向亚洲迁移和适应的见解。

Nat Genet. 2016 Sep;48(9):1066-70. doi: 10.1038/ng.3621. Epub 2016 Jul 25.

Proc Natl Acad Sci U S A. 2016 Feb 9;113(6):1594-9. doi: 10.1073/pnas.1513197113. Epub 2016 Jan 25.

A global reference for human genetic variation.人类遗传变异的全球参考。

Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.

Second-generation PLINK: rising to the challenge of larger and richer datasets.第二代PLINK：应对更大、更丰富数据集的挑战

Gigascience. 2015 Feb 25;4:7. doi: 10.1186/s13742-015-0047-8. eCollection 2015.

Characterizing the genetic differences between two distinct migrant groups from Indo-European and Dravidian speaking populations in India.描绘印度印欧语系和达罗毗荼语系两个不同移民群体之间的基因差异。

BMC Genet. 2014 Jul 22;15:86. doi: 10.1186/1471-2156-15-86.

Population and genomic lessons from genetic analysis of two Indian populations.来自两个印度人群基因分析的群体与基因组学经验教训。

Hum Genet. 2014 Oct;133(10):1273-87. doi: 10.1007/s00439-014-1462-0. Epub 2014 Jul 1.

The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.NHGRI GWAS Catalog，一个经过精心策划的 SNP 与特征关联资源。

Nucleic Acids Res. 2014 Jan;42(Database issue):D1001-6. doi: 10.1093/nar/gkt1229. Epub 2013 Dec 6.

Genetic evidence for recent population mixture in India.印度近期人口混合的遗传证据。

Am J Hum Genet. 2013 Sep 5;93(3):422-38. doi: 10.1016/j.ajhg.2013.07.006. Epub 2013 Aug 8.

An integrated map of genetic variation from 1,092 human genomes.1092 个人类基因组遗传变异的综合图谱。

Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

千人基因组计划数据集中的群体分层与印度次大陆遗传多样性代表性不足

Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献