Ferdosi Mohammad H, Kinghorn Brian P, van der Werf Julius H J, Lee Seung Hwan, Gondro Cedric
The Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, Australia.
BMC Bioinformatics. 2014 Jun 7;15:172. doi: 10.1186/1471-2105-15-172.
Identification of recombination events and which chromosomal segments contributed to an individual is useful for a number of applications in genomic analyses including haplotyping, imputation, signatures of selection, and improved estimates of relationship and probability of identity by descent. Genotypic data on half-sib family groups are widely available in livestock genomics. This structure makes it possible to identify recombination events accurately even with only a few individuals and it lends itself well to a range of applications such as parentage assignment and pedigree verification.
Here we present hsphase, an R package that exploits the genetic structure found in half-sib livestock data to identify and count recombination events, impute and phase un-genotyped sires and phase its offspring. The package also allows reconstruction of family groups (pedigree inference), identification of pedigree errors and parentage assignment. Additional functions in the package allow identification of genomic mapping errors, imputation of paternal high density genotypes from low density genotypes, evaluation of phasing results either from hsphase or from other phasing programs. Various diagnostic plotting functions permit rapid visual inspection of results and evaluation of datasets.
The hsphase package provides a suite of functions for analysis and visualization of genomic structures in half-sib family groups implemented in the widely used R programming environment. Low level functions were implemented in C++ and parallelized to improve performance. hsphase was primarily designed for use with high density SNP array data but it is fast enough to run directly on sequence data once they become more widely available. The package is available (GPL 3) from the Comprehensive R Archive Network (CRAN) or from http://www-personal.une.edu.au/~cgondro2/hsphase.htm.
识别重组事件以及哪些染色体片段对个体有贡献,对于基因组分析中的许多应用都很有用,包括单倍型分型、归因、选择特征以及对亲缘关系和通过血缘关系确定身份的概率进行更准确的估计。半同胞家系群体的基因型数据在畜牧基因组学中广泛可用。这种结构使得即使只有少数个体也能准确识别重组事件,并且非常适合一系列应用,如亲子关系判定和系谱验证。
在此,我们展示了hsphase,一个R软件包,它利用半同胞畜牧数据中发现的遗传结构来识别和计数重组事件,对未分型的父本进行归因和定相,并对其后代进行定相。该软件包还允许重建家系群体(系谱推断)、识别系谱错误和进行亲子关系判定。软件包中的其他功能允许识别基因组映射错误,从低密度基因型推断父本的高密度基因型,评估来自hsphase或其他定相程序的定相结果。各种诊断绘图函数允许快速直观地检查结果和评估数据集。
hsphase软件包在广泛使用的R编程环境中提供了一套用于分析和可视化半同胞家系群体基因组结构的函数。底层函数用C++实现并进行了并行化处理以提高性能。hsphase主要设计用于高密度SNP阵列数据,但一旦序列数据更广泛可用,它也足够快可以直接在序列数据上运行。该软件包(GPL 3)可从综合R存档网络(CRAN)或http://www-personal.une.edu.au/~cgondro2/hsphase.htm获取。