Division of Human Genetics, National Institute of Genetics, Shizuoka, Japan.
BMC Genomics. 2013 May 28;14:355. doi: 10.1186/1471-2164-14-355.
The human leukocyte antigen (HLA) region, the 3.8-Mb segment of the human genome at 6p21, has been associated with more than 100 different diseases, mostly autoimmune diseases. Due to the complex nature of HLA genes, there are difficulties in elucidating complete HLA gene sequences especially HLA gene haplotype structures by the conventional sequencing method. We propose a novel, accurate, and cost-effective method for generating phase-defined complete sequencing of HLA genes by using indexed multiplex next generation sequencing.
A total of 33 HLA homozygous samples, 11 HLA heterozygous samples, and 3 parents-child families were subjected to phase-defined HLA gene sequencing. We applied long-range PCR to amplify six HLA genes (HLA-A, -C, -B, DRB1, -DQB1, and -DPB1) followed by transposase-based library construction and multiplex sequencing with the MiSeq sequencer. Paired-end reads (2 × 250 bp) derived from the sequencer were aligned to the six HLA gene segments of UCSC hg19 allowing at most 80 bases mismatch. For HLA homozygous samples, the six amplicons of an individual were pooled and simultaneously sequenced and mapped as an individual-tagging method. The paired-end reads were aligned to corresponding genes of UCSC hg19 and unambiguous, continuous sequences were obtained. For HLA heterozygous samples, each amplicon was separately sequenced and mapped as a gene-tagging method. After alignments, we detected informative paired-end reads harboring SNVs on both forward and reverse reads that are used to separate two chromosomes and to generate two phase-defined sequences in an individual. Consequently, we were able to determine the phase-defined HLA gene sequences from promoter to 3'-UTR and assign up to 8-digit HLA allele numbers, regardless of whether the alleles are rare or novel. Parent-child trio-based sequencing validated our sequencing and phasing methods.
Our protocol generated phased-defined sequences of the entire HLA genes, resulting in high resolution HLA typing and new allele detection.
人类白细胞抗原 (HLA) 区域是人类基因组 6p21 上的 3.8Mb 片段,与 100 多种不同疾病相关,主要是自身免疫性疾病。由于 HLA 基因的复杂性,通过传统测序方法阐明完整的 HLA 基因序列,特别是 HLA 基因单倍型结构存在困难。我们提出了一种新的、准确的、具有成本效益的方法,通过索引多重下一代测序生成相定义的 HLA 基因完整测序。
对 33 个 HLA 纯合子样本、11 个 HLA 杂合子样本和 3 个亲子家庭进行了相定义的 HLA 基因测序。我们应用长距离 PCR 扩增六个 HLA 基因(HLA-A、-C、-B、DRB1、-DQB1 和 -DPB1),然后进行基于转座酶的文库构建和 MiSeq 测序仪的多重测序。从测序仪获得的配对末端读取(2×250bp)与 UCSC hg19 的六个 HLA 基因片段对齐,允许最多 80 个碱基错配。对于 HLA 纯合子样本,将个体的六个扩增子混合并同时进行测序和映射,作为个体标记方法。将配对末端读取与 UCSC hg19 的相应基因对齐,并获得明确、连续的序列。对于 HLA 杂合子样本,每个扩增子分别进行测序和映射,作为基因标记方法。对齐后,我们检测到包含正向和反向读取上 SNVs 的信息性配对末端读取,这些 SNVs 用于分离两条染色体,并在个体中生成两个相定义的序列。因此,我们能够从启动子到 3'-UTR 确定相定义的 HLA 基因序列,并分配多达 8 位数字的 HLA 等位基因编号,无论等位基因是稀有还是新颖。亲子三系测序验证了我们的测序和分相方法。
我们的方案生成了整个 HLA 基因的相定义序列,从而实现了高分辨率 HLA 分型和新等位基因检测。