Schäfer Christian, Schmidt Alexander H, Sauter Jürgen
DKMS gemeinnützige GmbH, Kressbach 1, 72072, Tübingen, Germany.
BMC Bioinformatics. 2017 May 30;18(1):284. doi: 10.1186/s12859-017-1692-y.
Knowledge of HLA haplotypes is helpful in many settings as disease association studies, population genetics, or hematopoietic stem cell transplantation. Regarding the recruitment of unrelated hematopoietic stem cell donors, HLA haplotype frequencies of specific populations are used to optimize both donor searches for individual patients and strategic donor registry planning. However, the estimation of haplotype frequencies from HLA genotyping data is challenged by the large amount of genotype data, the complex HLA nomenclature, and the heterogeneous and ambiguous nature of typing records.
To meet these challenges, we have developed the open-source software Hapl-o-Mat. It estimates haplotype frequencies from population data including an arbitrary number of loci using an expectation-maximization algorithm. Its key features are the processing of different HLA typing resolutions within a given population sample and the handling of ambiguities recorded via multiple allele codes or genotype list strings. Implemented in C++, Hapl-o-Mat facilitates efficient haplotype frequency estimation from large amounts of genotype data. We demonstrate its accuracy and performance on the basis of artificial and real genotype data.
Hapl-o-Mat is a versatile and efficient software for HLA haplotype frequency estimation. Its capability of processing various forms of HLA genotype data allows for a straightforward haplotype frequency estimation from typing records usually found in stem cell donor registries.
HLA单倍型知识在许多情况下都很有用,如疾病关联研究、群体遗传学或造血干细胞移植。关于招募无关造血干细胞供者,特定人群的HLA单倍型频率用于优化针对个体患者的供者搜索以及战略供者登记规划。然而,从HLA基因分型数据估计单倍型频率面临诸多挑战,包括大量的基因型数据、复杂的HLA命名法以及分型记录的异质性和模糊性。
为应对这些挑战,我们开发了开源软件Hapl - o - Mat。它使用期望最大化算法从包括任意数量位点的群体数据中估计单倍型频率。其关键特性包括在给定群体样本中处理不同的HLA分型分辨率,以及处理通过多个等位基因代码或基因型列表字符串记录的模糊性。Hapl - o - Mat用C++实现,便于从大量基因型数据中高效估计单倍型频率。我们基于人工和真实基因型数据展示了其准确性和性能。
Hapl - o - Mat是一款用于HLA单倍型频率估计的通用且高效的软件。其处理各种形式HLA基因型数据的能力使得能够从干细胞供者登记处常见的分型记录中直接估计单倍型频率。