Boudeau Samantha, Ramakodi Meganathan P, Zhou Yan, Liu Jeffrey C, Ragin Camille, Kulathinal Rob J
Department of Biology, Temple University, Philadelphia, PA, United States.
Cancer Prevention and Control Program, Fox Chase Cancer Center, Philadelphia, PA, United States.
Front Genet. 2023 Feb 23;14:1061781. doi: 10.3389/fgene.2023.1061781. eCollection 2023.
Human populations are often highly structured due to differences in genetic ancestry among groups, posing difficulties in associating genes with diseases. Ancestry-informative markers (AIMs) aid in the detection of population stratification and provide an alternative approach to map population-specific alleles to disease. Here, we identify and characterize a novel set of African AIMs that separate populations of African ancestry from other global populations including those of European ancestry. Using data from the 1000 Genomes Project, highly informative SNP markers from five African subpopulations were selected based on estimates of informativeness (In) and compared against the European population to generate a final set of 46,737 African ancestry-informative markers (AIMs). The AIMs identified were validated using an independent set and functionally annotated using tools like SIFT, PolyPhen. They were also investigated for representation of commonly used SNP arrays. This set of African AIMs effectively separates populations of African ancestry from other global populations and further identifies substructure between populations of African ancestry. When a subset of these AIMs was studied in an independent dataset, they differentiated people who self-identify as African American or Black from those who identify their ancestry as primarily European. Most of the AIMs were found to be in their intergenic and intronic regions with only 0.6% in the coding regions of the genome. Most of the commonly used SNP array investigated contained less than 10% of the AIMs. While several functional annotations of both coding and non-coding African AIMs are supported by the literature and linked these high-frequency African alleles to diseases in African populations, more effort is needed to map genes to diseases in these genetically diverse subpopulations. The relative dearth of these African AIMs on current genotyping platforms (the array with the highest fraction, llumina's Omni 5, harbors less than a quarter of AIMs), further demonstrates a greater need to better represent historically understudied populations.
由于群体间遗传祖先的差异,人类群体往往具有高度的结构,这给将基因与疾病联系起来带来了困难。祖先信息标记(AIMs)有助于检测群体分层,并提供一种将群体特异性等位基因映射到疾病的替代方法。在这里,我们鉴定并表征了一组新的非洲AIMs,它们将非洲血统的群体与其他全球群体(包括欧洲血统的群体)区分开来。利用千人基因组计划的数据,根据信息性(In)估计从五个非洲亚群体中选择了高度信息性的单核苷酸多态性(SNP)标记,并与欧洲群体进行比较,以生成一组最终的46737个非洲血统信息标记(AIMs)。所鉴定的AIMs使用独立数据集进行验证,并使用SIFT、PolyPhen等工具进行功能注释。还研究了它们在常用SNP阵列中的代表性。这组非洲AIMs有效地将非洲血统的群体与其他全球群体区分开来,并进一步识别了非洲血统群体之间的亚结构。当在一个独立数据集中研究这些AIMs的一个子集时,它们将自我认定为非裔美国人或黑人的人与那些将其祖先主要认定为欧洲人的人区分开来。发现大多数AIMs位于基因间和内含子区域,基因组编码区域中只有0.6%。所研究的大多数常用SNP阵列包含的AIMs不到10%。虽然文献支持编码和非编码非洲AIMs的几种功能注释,并将这些高频非洲等位基因与非洲人群中的疾病联系起来,但在这些遗传多样性亚群体中将基因映射到疾病还需要更多努力。当前基因分型平台上这些非洲AIMs相对匮乏(比例最高的阵列,Illumina的Omni 5,包含的AIMs不到四分之一),进一步表明更需要更好地代表历史上研究不足的群体。