University of Oxford, Wellcome Centre for Human Genetics, Oxford, UK.
Bioinformatics. 2019 Mar 1;35(5):798-806. doi: 10.1093/bioinformatics/bty735.
The Li and Stephens model, which approximates the coalescent describing the pattern of variation in a population, underpins a range of key tools and results in genetics. Although highly efficient compared to the coalescent, standard implementations of this model still cannot deal with the very large reference cohorts that are starting to become available, and practical implementations use heuristics to achieve reasonable runtimes.
Here I describe a new, exact algorithm ('fastLS') that implements the Li and Stephens model and achieves runtimes independent of the size of the reference cohort. Key to achieving this runtime is the use of the Burrows-Wheeler transform, allowing the algorithm to efficiently identify partial haplotype matches across a cohort. I show that the proposed data structure is very similar to, and generalizes, Durbin's positional Burrows-Wheeler transform.
李-斯蒂芬斯模型(Li and Stephens model),它近似于描述群体中变异模式的合并模型(coalescent),是遗传学中一系列关键工具和成果的基础。尽管与合并模型相比,该模型的标准实现效率更高,但它仍然无法处理开始变得可用的非常大的参考队列,实际实现使用启发式方法来实现合理的运行时间。
在这里,我描述了一种新的、精确的算法('fastLS'),它实现了李-斯蒂芬斯模型,并实现了与参考队列大小无关的运行时间。实现这种运行时间的关键是使用 Burrows-Wheeler 变换,允许算法在整个队列中有效地识别部分单倍型匹配。我表明,所提出的数据结构与 Durbin 的位置 Burrows-Wheeler 变换非常相似,并对其进行了推广。