Eberhard H-P, Madbouly A S, Gourraud P A, Balère M L, Feldmann U, Gragert L, Torres H Maldonado, Pingel J, Schmidt A H, Steiner D, van der Zanden H G M, Oudshoorn M, Marsh S G E, Maiers M, Müller C R
Zentrales Knochenmarkspender-Register Deutschland (ZKRD), Ulm, Germany.
Tissue Antigens. 2013 Aug;82(2):93-105. doi: 10.1111/tan.12160.
Estimation of human leukocyte antigen (HLA) haplotype frequencies from unrelated stem cell donor registries presents a challenge because of large sample sizes and heterogeneity of HLA typing data. For the 14th International HLA and Immunogenetics Workshop, five bioinformatics groups initiated the 'Registry Diversity Component' aiming to cross-validate and improve current haplotype estimation tools. Five datasets were derived from different donor registries and then used as input for five different computer programs for haplotype frequency estimation. Because of issues related to heterogeneity and complexity of HLA typing data identified in the initial phase, the same five implementations, and two new ones, were used on simulated datasets in a controlled experiment where the correct results were known a priori. These datasets contained various fractions of missing HLA-DR modeled after European haplotype frequencies. We measured the contribution of sampling fluctuation and estimation error to the deviation of the frequencies from their true values, finding equivalent contributions of each for the chosen samples. Because of patient-directed activities, selective prospective typing strategies and the variety and evolution of typing technology, some donors have more complete and better HLA data. In this setting, we show that restricting estimation to fully typed individuals introduces biases that could be overcome by including all donors in frequency estimation. Our study underlines the importance of critical review and validation of tools in registry-related activity and provides a sustainable framework for validating the computational tools used. Accurate frequencies are essential for match prediction to improve registry operations and to help more patients identify suitably matched donors.
从无关干细胞供体登记处估计人类白细胞抗原(HLA)单倍型频率是一项挑战,因为样本量庞大且HLA分型数据存在异质性。在第14届国际HLA与免疫遗传学研讨会上,五个生物信息学团队发起了“登记处多样性项目”,旨在交叉验证并改进当前的单倍型估计工具。五个数据集来自不同的供体登记处,然后用作五个不同计算机程序进行单倍型频率估计的输入。由于在初始阶段发现了与HLA分型数据的异质性和复杂性相关的问题,在一个先验已知正确结果的对照实验中,对模拟数据集使用了相同的五个程序版本以及两个新的版本。这些数据集包含根据欧洲单倍型频率建模的不同比例的缺失HLA - DR数据。我们测量了抽样波动和估计误差对频率与其真实值偏差的贡献,发现所选样本中两者的贡献相当。由于针对患者的活动、选择性前瞻性分型策略以及分型技术的多样性和演变,一些供体拥有更完整、更好的HLA数据。在这种情况下,我们表明将估计限制在完全分型的个体上会引入偏差,而在频率估计中纳入所有供体可以克服这些偏差。我们的研究强调了在与登记处相关的活动中对工具进行严格审查和验证的重要性,并为验证所使用的计算工具提供了一个可持续的框架。准确的频率对于匹配预测至关重要,有助于改善登记处的运作,并帮助更多患者找到匹配合适的供体。