Gautier Mathieu, Klassmann Alexander, Vitalis Renaud
INRA, UMR CBGP, Montferrier-sur-Lez, F-34988, France.
Institut de Biologie Computationnelle, Montpellier, F-34095, France.
Mol Ecol Resour. 2017 Jan;17(1):78-90. doi: 10.1111/1755-0998.12634. Epub 2016 Nov 28.
Identifying genomic regions with unusually high local haplotype homozygosity represents a powerful strategy to characterize candidate genes responding to natural or artificial positive selection. To that end, statistics measuring the extent of haplotype homozygosity within (e.g. EHH, iHS) and between (Rsb or XP-EHH) populations have been proposed in the literature. The rehh package for r was previously developed to facilitate genome-wide scans of selection, based on the analysis of long-range haplotypes. However, its performance was not sufficient to cope with the growing size of available data sets. Here, we propose a major upgrade of the rehh package, which includes an improved processing of the input files, a faster algorithm to enumerate haplotypes, as well as multithreading. As illustrated with the analysis of large human haplotype data sets, these improvements decrease the computation time by more than one order of magnitude. This new version of rehh will thus allow performing iHS-, Rsb- or XP-EHH-based scans on large data sets. The package rehh 2.0 is available from the CRAN repository (http://cran.r-project.org/web/packages/rehh/index.html) together with help files and a detailed manual.
识别具有异常高局部单倍型纯合性的基因组区域是表征响应自然或人工正选择的候选基因的有力策略。为此,文献中提出了测量群体内部(如EHH、iHS)和群体之间(Rsb或XP-EHH)单倍型纯合程度的统计方法。用于R语言的rehh软件包先前已开发出来,以基于对长程单倍型的分析促进全基因组选择扫描。然而,其性能不足以应对不断增长的可用数据集规模。在此,我们提出对rehh软件包进行重大升级,其中包括对输入文件的改进处理、用于枚举单倍型的更快算法以及多线程。如对大型人类单倍型数据集的分析所示,这些改进将计算时间减少了一个多数量级。因此,rehh的这个新版本将允许对大型数据集进行基于iHS、Rsb或XP-EHH的扫描。软件包rehh 2.0可从CRAN资源库(http://cran.r-project.org/web/packages/rehh/index.html)获取,同时还提供帮助文件和详细手册。