Yu Dottie, Ayyala Ram, Sadek Sarah Hany, Chittampalli Likhitha, Farooq Hafsa, Jung Junghyun, Nahid Abdullah Al, Boldirev Grigore, Jung Mina, Park Sungmin, Nguyen Austin, Zelikovsky Alex, Mancuso Nicholas, Joo Jong Wha J, Thompson Reid F, Alachkar Houda, Mangul Serghei
Department of Quantitative and Computational Biology, Dornsife College of Letters, Arts and Sciences, University of Southern California, 1975 Zonal Ave, Los Angeles, CA 90033, USA.
Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, USA.
bioRxiv. 2024 Jan 16:2023.05.22.541750. doi: 10.1101/2023.05.22.541750.
Accurate identification of human leukocyte antigen (HLA) alleles is essential for various clinical and research applications, such as transplant matching and drug sensitivities. Recent advances in RNA-seq technology have made it possible to impute HLA types from sequencing data, spurring the development of a large number of computational HLA typing tools. However, the relative performance of these tools is unknown, limiting the ability for clinical and biomedical research to make informed choices regarding which tools to use. Here we report the study design of a comprehensive benchmarking of the performance of 12 HLA callers across 682 RNA-seq samples from 8 datasets with molecularly defined gold standard at 5 loci, HLA-A, -B, -C, -DRB1, and -DQB1. For each HLA typing tool, we will comprehensively assess their accuracy, compare default with optimized parameters, and examine for discrepancies in accuracy at the allele and loci levels. We will also evaluate the computational expense of each HLA caller measured in terms of CPU time and RAM. We also plan to evaluate the influence of read length over the HLA region on accuracy for each tool. Most notably, we will examine the performance of HLA callers across European and African groups, to determine discrepancies in accuracy associated with ancestry. We hypothesize that RNA-Seq HLA callers are capable of returning high-quality results, but the tools that offer a good balance between accuracy and computational expensiveness for all ancestry groups are yet to be developed. We believe that our study will provide clinicians and researchers with clear guidance to inform their selection of an appropriate HLA caller.
准确识别人类白细胞抗原(HLA)等位基因对于各种临床和研究应用至关重要,例如移植配型和药物敏感性研究。RNA测序技术的最新进展使得从测序数据中推断HLA类型成为可能,这推动了大量计算HLA分型工具的开发。然而,这些工具的相对性能尚不清楚,这限制了临床和生物医学研究在选择使用何种工具时做出明智决策的能力。在此,我们报告了一项全面基准测试的研究设计,该测试针对来自8个数据集的682个RNA测序样本中的12种HLA分型工具在5个位点(HLA-A、-B、-C、-DRB1和-DQB1)的性能进行评估,这些样本具有分子定义的金标准。对于每种HLA分型工具,我们将全面评估其准确性,比较默认参数和优化参数,并检查等位基因和位点水平上准确性的差异。我们还将评估每种HLA分型工具以CPU时间和随机存取存储器(RAM)衡量的计算成本。我们还计划评估HLA区域的读长对每种工具准确性的影响。最值得注意的是,我们将研究欧洲和非洲群体中HLA分型工具的性能,以确定与祖先相关的准确性差异。我们假设RNA测序HLA分型工具能够返回高质量的结果,但尚未开发出在所有祖先群体中准确性和计算成本之间取得良好平衡的工具。我们相信我们的研究将为临床医生和研究人员提供明确的指导,以帮助他们选择合适的HLA分型工具。