Department of Computer Science, Columbia University,New York, New York 10027, USA.
Genetics. 2012 Feb;190(2):679-89. doi: 10.1534/genetics.111.134874. Epub 2011 Nov 30.
Whole-genome sequencing in an isolated population with few founders directly ascertains variants from the population bottleneck that may be rare elsewhere. In such populations, shared haplotypes allow imputation of variants in unsequenced samples without resorting to complex statistical methods as in studies of outbred cohorts. We focus on an isolated population cohort from the Pacific Island of Kosrae, Micronesia, where we previously collected SNP array and rich phenotype data for the majority of the population. We report identification of long regions with haplotypes co-inherited between pairs of individuals and methodology to leverage such shared genetic content for imputation. Our estimates show that sequencing as few as 40 personal genomes allows for inference in up to 60% of the 3000-person cohort at the average locus. We ascertained a pilot data set of whole-genome sequences from seven Kosraean individuals, with average 5× coverage. This assay identified 5,735,306 unique sites of which 1,212,831 were previously unknown. Additionally, these variants are unusually enriched for alleles that are rare in other populations when compared to geographic neighbors (published Korean genome SJK). We used the presence of shared haplotypes between the seven Kosraen individuals to estimate expected imputation accuracy of known and novel homozygous variants at 99.6% and 97.3%, respectively. This study presents whole-genome analysis of a homogenous isolate population with emphasis on optimal rare variant inference.
在一个仅有少数创始者的隔离人群中进行全基因组测序,可以直接确定人群瓶颈期的变异,而这些变异在其他地方可能很少见。在这样的人群中,共享的单倍型允许在没有像在杂交群体研究中那样使用复杂统计方法的情况下,对未测序样本中的变异进行推断。我们专注于来自密克罗尼西亚太平洋岛屿科斯雷的一个隔离人群队列,我们之前已经为该人群的大多数收集了 SNP 芯片和丰富的表型数据。我们报告了鉴定个体之间共同遗传的长单倍型区域的方法,并利用这种共享的遗传内容进行推断。我们的估计表明,测序少至 40 个人的基因组就可以在平均位点上推断出多达 60%的 3000 人队列中的个体。我们从七个科斯雷人个体中确定了一个全基因组序列的试点数据集,平均覆盖 5×。该检测鉴定了 5735306 个独特的位点,其中 1212831 个是以前未知的。此外,与地理邻居(已发表的韩国基因组 SJK)相比,这些变体异常丰富了其他人群中罕见的等位基因。我们利用七个科斯雷个体之间的共享单倍型来估计已知和新的纯合变体的预期推断准确性,分别为 99.6%和 97.3%。这项研究对一个同质隔离人群进行了全基因组分析,重点是优化稀有变异推断。