Liu Xuanyao, Saw Woei-Yuh, Ali Mohammad, Ong Rick Twee-Hee, Teo Yik-Ying
Saw Swee Hock School of Public Health, National University of Singapore, MD3 16 Medical Drive, Singapore 117597, Singapore.
BMC Genomics. 2014 May 2;15(1):332. doi: 10.1186/1471-2164-15-332.
The HUGO Pan-Asian SNP Consortium (PASNP) has generated a genetic resource of almost 55,000 autosomal single nucleotide polymorphisms (SNPs) across more than 1,800 individuals from 73 urban and indigenous populations in Asia. This has offered valuable insights into the correlation between the genetic ancestry of these populations with major linguistic systems and geography. Here, we attempt to understand whether adaptation to local climate, diet and environment partly explains the genetic variation present in these populations by investigating the genomic signatures of positive selection.
To evaluate the impact to the selection analyses due to the considerably lower SNP density as compared to other population genetics resources such as the International HapMap Project (HapMap) or the Singapore Genome Variation Project, we evaluated the extent of haplotype phasing switch errors and the consistency of selection signals from three haplotype-based approaches (iHS, XP-EHH, haploPS) when the HapMap data is thinned to a similar density as PASNP. We subsequently applied haploPS to detect and characterize positive selection in the PASNP populations, identifying 59 genomics regions that were selected in at least one PASNP populations. A cluster analysis on the basis of these 59 signals showed that indigenous populations such as the Negrito from Malaysia and Philippines, the China Hmong, and the Taiwan Ami and Atayal shared more of these signals. We also reported evidence of a positive selection signal encompassing the beta globin gene in the Taiwan Ami and Atayal that was distinct from the signal in the HapMap Africans, suggesting the possibility of convergent evolution at this locus due to malarial selection.
We established that the lower SNP content of the PASNP data conferred weaker ability to detect signatures of positive selection, but the availability of the new approach haploPS retained modest power. Out of all the populations in PASNP, we identified only 59 signals, suggesting a strong need for high-density population-level genotyping data or sequencing data in order to achieve a comprehensive survey of positive selection in Asian populations.
人类基因组组织泛亚单核苷酸多态性协会(PASNP)已生成了一份遗传资源,涵盖来自亚洲73个城市和原住民群体的1800多名个体的近55000个常染色体单核苷酸多态性(SNP)。这为了解这些群体的遗传血统与主要语言系统和地理之间的相关性提供了宝贵见解。在此,我们试图通过研究正选择的基因组特征,来了解对当地气候、饮食和环境的适应性是否部分解释了这些群体中存在的遗传变异。
为了评估与其他群体遗传学资源(如国际人类基因组单体型图计划(HapMap)或新加坡基因组变异计划)相比,PASNP中SNP密度显著较低对选择分析的影响,我们在将HapMap数据稀疏至与PASNP相似的密度时,评估了单倍型定相转换错误的程度以及三种基于单倍型的方法(iHS、XP-EHH、haploPS)的选择信号的一致性。随后,我们应用haploPS来检测和表征PASNP群体中的正选择,识别出在至少一个PASNP群体中被选择的59个基因组区域。基于这59个信号的聚类分析表明,马来西亚和菲律宾的尼格利陀人、中国苗族以及台湾阿美族和泰雅族等原住民群体共享了更多这些信号。我们还报告了台湾阿美族和泰雅族中一个包含β珠蛋白基因的正选择信号的证据,该信号与HapMap非洲人群体中的信号不同,这表明由于疟疾选择,该位点可能存在趋同进化。
我们确定,PASNP数据中较低的SNP含量导致检测正选择特征的能力较弱,但新方法haploPS的可用性保留了一定的效力。在PASNP的所有群体中,我们仅识别出59个信号,这表明强烈需要高密度的群体水平基因分型数据或测序数据,以便对亚洲人群体中的正选择进行全面调查。