Moshkov Nikita, Smetanin Aleksandr, Tatarinova Tatiana V
Doctoral School of Interdisciplinary Medicine, University of Szeged, Szeged, Hungary.
Synthetic and Systems Biology Unit, Biological Research Centre, Szeged, Hungary.
PeerJ. 2021 Dec 14;9:e12502. doi: 10.7717/peerj.12502. eCollection 2021.
We developed , a new tool for determining local ancestry along a genome using whole-genome sequencing data or high-density genotyping experiments. PyLAE can process an arbitrarily large number of ancestral populations (with or without an informative prior). Since does not involve estimating many parameters, it can process thousands of genomes within a day. can run on phased or unphased genomic data. We have shown how can be applied to the identification of differentially enriched pathways between populations. The local ancestry approach results in higher enrichment scores compared to whole-genome approaches. We benchmarked PyLAE using the 1000 Genomes dataset, comparing the aggregated predictions with the global admixture results and the current gold standard program RFMix. Computational efficiency, minimal requirements for data pre-processing, straightforward presentation of results, and ease of installation make a valuable tool to study admixed populations.
The source code and installation manual are available at https://github.com/smetam/pylae.
我们开发了一种新工具PyLAE,用于使用全基因组测序数据或高密度基因分型实验来确定基因组上的局部祖先。PyLAE可以处理任意数量的祖先群体(有或没有信息先验)。由于不涉及估计许多参数,它可以在一天内处理数千个基因组。PyLAE可以在分阶段或未分阶段的基因组数据上运行。我们展示了如何将PyLAE应用于识别群体间差异富集的途径。与全基因组方法相比,局部祖先方法产生更高的富集分数。我们使用千人基因组数据集对PyLAE进行了基准测试,将汇总预测与全局混合结果以及当前的金标准程序RFMix进行了比较。计算效率、对数据预处理的最低要求、结果的直观呈现以及易于安装使PyLAE成为研究混合群体的有价值工具。
源代码和安装手册可在https://github.com/smetam/pylae上获取。