Szpiech Zachary A, Hernandez Ryan D
Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco
Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco Institute for Human Genetics, University of California, San Francisco Institute for Quantitative Biosciences (QB3), University of California, San Francisco.
Mol Biol Evol. 2014 Oct;31(10):2824-7. doi: 10.1093/molbev/msu211. Epub 2014 Jul 10.
Haplotype-based scans to detect natural selection are useful to identify recent or ongoing positive selection in genomes. As both real and simulated genomic data sets grow larger, spanning thousands of samples and millions of markers, there is a need for a fast and efficient implementation of these scans for general use. Here, we present selscan, an efficient multithreaded application that implements Extended Haplotype Homozygosity (EHH), Integrated Haplotype Score (iHS), and Cross-population EHH (XPEHH). selscan accepts phased genotypes in multiple formats, including TPED, and performs extremely well on both simulated and real data and over an order of magnitude faster than existing available implementations. It calculates iHS on chromosome 22 (22,147 loci) across 204 CEU haplotypes in 353 s on one thread (33 s on 16 threads) and calculates XPEHH for the same data relative to 210 YRI haplotypes in 578 s on one thread (52 s on 16 threads). Source code and binaries (Windows, OSX, and Linux) are available at https://github.com/szpiech/selscan.
基于单倍型的扫描以检测自然选择,对于识别基因组中近期或正在进行的正选择很有用。随着真实和模拟基因组数据集变得越来越大,涵盖数千个样本和数百万个标记,需要一种快速有效的方法来普遍实施这些扫描。在此,我们展示了selscan,这是一个高效的多线程应用程序,它实现了扩展单倍型纯合性(EHH)、综合单倍型评分(iHS)和跨群体EHH(XPEHH)。selscan接受多种格式的分型基因型,包括TPED,并且在模拟数据和真实数据上都表现出色,比现有的可用实现快一个数量级以上。它在一个线程上353秒内计算204个CEU单倍型在22号染色体(22,147个位点)上的iHS(在16个线程上为33秒),并在一个线程上578秒内(在16个线程上为52秒)计算相对于210个YRI单倍型的相同数据的XPEHH。源代码和二进制文件(适用于Windows、OSX和Linux)可在https://github.com/szpiech/selscan获取。