Lynch Michael
Department of Biology, Indiana University, Bloomington, Indiana 47405, USA.
Genetics. 2009 May;182(1):295-301. doi: 10.1534/genetics.109.100479. Epub 2009 Mar 16.
A new generation of high-throughput sequencing strategies will soon lead to the acquisition of high-coverage genomic profiles of hundreds to thousands of individuals within species, generating unprecedented levels of information on the frequencies of nucleotides segregating at individual sites. However, because these new technologies are error prone and yield uneven coverage of alleles in diploid individuals, they also introduce the need for novel methods for analyzing the raw read data. A maximum-likelihood method for the estimation of allele frequencies is developed, eliminating both the need to arbitrarily discard individuals with low coverage and the requirement for an extrinsic measure of the sequence error rate. The resultant estimates are nearly unbiased with asymptotically minimal sampling variance, thereby defining the limits to our ability to estimate population-genetic parameters and providing a logical basis for the optimal design of population-genomic surveys.
新一代高通量测序策略很快将带来物种内数百至数千个体的高覆盖基因组图谱的获取,产生关于单个位点核苷酸分离频率的前所未有的信息量。然而,由于这些新技术容易出错且在二倍体个体中产生等位基因覆盖不均的情况,它们也带来了对分析原始读取数据的新方法的需求。开发了一种用于估计等位基因频率的最大似然方法,既消除了任意舍弃低覆盖个体的需要,也消除了对序列错误率进行外部测量的要求。所得估计几乎无偏差,渐近采样方差最小,从而确定了我们估计群体遗传参数能力的极限,并为群体基因组调查的优化设计提供了逻辑基础。