Division of Basic Medical Science and Molecular Medicine, Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Japan.
Molecular Biology Applications, Pacific Biosciences, Inc, Menlo Park, CA, United States.
Front Immunol. 2018 Oct 4;9:2294. doi: 10.3389/fimmu.2018.02294. eCollection 2018.
Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2%. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.
虽然 NGS 技术为识别和分类 HLA 基因提供了高通量 HLA 基因分型方法的进展,以辅助疾病和移植方面的精准医学努力,但由于缺乏充分特征化的高频率 HLA 等位基因参考序列数据库,这些方法的效率受到阻碍。用于高度多态性 HLA 基因系统。在这里,我们报告了产生一个全面的日本人群中发现的八个经典 HLA 基因座全长 HLA 等位基因序列集合。我们使用第三代 SMRT 测序方法提供的长扩增子跨越共识读数来扩充第二代短读数据,从而创建了 HLA 类 I 和 II 基因等位基因的参考级高质量序列,分辨率达到基因组编码和非编码水平。从以前用于建立日本人群 HLA 等位基因频率数据的参考集中获得了 46 个 DNA。这些样本包括在日本人群中总等位基因频率超过 99.2%的等位基因。使用先前设计的 HLA 基因座特异性引物通过长距离 PCR 独立扩增 HLA 基因座,然后使用 SMRT 和 Ion PGM 测序仪对其进行测序。使用映射的长读和短读,借助于用于 SMRT 测序的 LAA 参考感知软件工具,生成共识 HLA 等位基因序列的参考文库。对 46 个健康个体进行了总共 253 个不同等位基因的确定。其中,有 137 个是新等位基因:101 个 SNV 和/或插入缺失和 36 个部分或全长水平的扩展等位基因。从核苷酸多样性的角度比较 HLA 序列发现,在八个 HLA 基因中,HLA-DRB1 是最具多态性的,而 HLA-DPB1 基因序列分化成两个不同的群体 DP2 和 DP5,在exon2 中产生了独立的多态性证据。我们还鉴定了 HLA-DRB1 中的两个特定内含子变异,它们可能与类风湿关节炎有关。总之,第三代和第二代技术的全长 HLA 等位基因测序为基因参考序列提供了多态性,分辨率达到基因组等位基因水平,包括分配到字段 4 级的等位基因变异,为精准医学和 HLA 相关疾病和移植研究奠定了更坚实的基础。