Lan James H, Yin Yuxin, Reed Elaine F, Moua Kevin, Thomas Kimberly, Zhang Qiuheng
UCLA Immunogenetics Center, Department of Pathology & Laboratory Medicine, Los Angeles, CA, USA; University of British Columbia, Clinician Investigator Program, Vancouver, BC, Canada.
UCLA Immunogenetics Center, Department of Pathology & Laboratory Medicine, Los Angeles, CA, USA.
Hum Immunol. 2015 Mar;76(2-3):166-75. doi: 10.1016/j.humimm.2014.12.016. Epub 2014 Dec 25.
Next-generation sequencing (NGS) is increasingly recognized for its ability to overcome allele ambiguity and deliver high-resolution typing in the HLA system. Using this technology, non-uniform read distribution can impede the reliability of variant detection, which renders high-confidence genotype calling particularly difficult to achieve in the polymorphic HLA complex. Recently, library construction has been implicated as the dominant factor in instigating coverage bias. To study the impact of this phenomenon on HLA genotyping, we performed long-range PCR on 12 samples to amplify HLA-A, -B, -C, -DRB1, and -DQB1, and compared the relative contribution of three Illumina library construction methods (TruSeq Nano, Nextera, Nextera XT) in generating downstream bias. Here, we show high GC% to be a good predictor of low sequencing depth. Compared to standard TruSeq Nano, GC bias was more prominent in transposase-based protocols, particularly Nextera XT, likely through a combination of transposase insertion bias being coupled with a high number of PCR enrichment cycles. Importantly, our findings demonstrate non-uniform read depth can have a direct and negative impact on the robustness of HLA genotyping, which has clinical implications for users when choosing a library construction strategy that aims to balance cost and throughput with data quality.
下一代测序(NGS)因其能够克服等位基因模糊性并在HLA系统中提供高分辨率分型而日益受到认可。使用这项技术时,不均匀的读数分布会妨碍变异检测的可靠性,这使得在多态性HLA复合体中实现高可信度的基因型分型尤其困难。最近,文库构建被认为是引发覆盖偏差的主要因素。为了研究这种现象对HLA基因分型的影响,我们对12个样本进行了长程PCR,以扩增HLA-A、-B、-C、-DRB1和-DQB1,并比较了三种Illumina文库构建方法(TruSeq Nano、Nextera、Nextera XT)在产生下游偏差方面的相对贡献。在此,我们表明高GC%是低测序深度的良好预测指标。与标准的TruSeq Nano相比,基于转座酶的方案(特别是Nextera XT)中的GC偏差更为突出,这可能是由于转座酶插入偏差与大量PCR富集循环相结合所致。重要的是,我们的研究结果表明,不均匀的读数深度会对HLA基因分型的稳健性产生直接负面影响,这对于在选择旨在平衡成本、通量和数据质量的文库构建策略时的用户具有临床意义。