CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
Genetics. 2011 Jan;187(1):229-44. doi: 10.1534/genetics.110.122614. Epub 2010 Nov 1.
Summary statistics are widely used in population genetics, but they suffer from the drawback that no simple sufficient summary statistic exists, which captures all information required to distinguish different evolutionary hypotheses. Here, we apply boosting, a recent statistical method that combines simple classification rules to maximize their joint predictive performance. We show that our implementation of boosting has a high power to detect selective sweeps. Demographic events, such as bottlenecks, do not result in a large excess of false positives. A comparison to other neutrality tests shows that our boosting implementation performs well compared to other neutrality tests. Furthermore, we evaluated the relative contribution of different summary statistics to the identification of selection and found that for recent sweeps integrated haplotype homozygosity is very informative whereas older sweeps are better detected by Tajima's π. Overall, Watterson's was found to contribute the most information for distinguishing between bottlenecks and selection.
摘要统计数据在群体遗传学中被广泛应用,但它们存在一个缺点,即没有一个简单的充分总结统计数据存在,它可以捕获区分不同进化假说所需的所有信息。在这里,我们应用了boosting,这是一种最近的统计方法,它结合了简单的分类规则来最大化它们的联合预测性能。我们表明,我们的boosting 实现具有很高的检测选择清扫的能力。人口统计学事件,如瓶颈,不会导致大量的假阳性。与其他中性检验的比较表明,我们的boosting 实现与其他中性检验相比表现良好。此外,我们评估了不同摘要统计数据对选择识别的相对贡献,发现对于最近的清扫,整合的单倍型同质性非常有信息,而较旧的清扫则由 Tajima 的π更好地检测到。总的来说,发现 Watterson 的对区分瓶颈和选择最有贡献。