Palaeogenetics Group, Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University, 55128 Mainz, Germany.
Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany.
Mol Biol Evol. 2023 Mar 4;40(3). doi: 10.1093/molbev/msad027.
Genomic regions under positive selection harbor variation linked for example to adaptation. Most tools for detecting positively selected variants have computational resource requirements rendering them impractical on population genomic datasets with hundreds of thousands of individuals or more. We have developed and implemented an efficient haplotype-based approach able to scan large datasets and accurately detect positive selection. We achieve this by combining a pattern matching approach based on the positional Burrows-Wheeler transform with model-based inference which only requires the evaluation of closed-form expressions. We evaluate our approach with simulations, and find it to be both sensitive and specific. The computational resource requirements quantified using UK Biobank data indicate that our implementation is scalable to population genomic datasets with millions of individuals. Our approach may serve as an algorithmic blueprint for the era of "big data" genomics: a combinatorial core coupled with statistical inference in closed form.
基因组中受到正选择作用的区域可能与适应等现象有关。大多数检测正选择变异的工具都需要大量的计算资源,因此在包含数十万甚至更多个体的群体基因组数据集中实际应用是不切实际的。我们开发并实现了一种高效的基于单倍型的方法,能够扫描大型数据集并准确检测正选择。我们通过结合基于位置的 Burrows-Wheeler 变换的模式匹配方法和基于模型的推断来实现这一点,后者仅需要评估闭式表达式。我们使用模拟来评估我们的方法,发现它具有敏感性和特异性。使用英国生物库数据量化的计算资源需求表明,我们的实现可以扩展到包含数百万个体的群体基因组数据集。我们的方法可以作为“大数据”基因组学时代的算法蓝图:组合核心与闭式统计推断相结合。