Bioinformatics and Computational Biology, University of Minnesota, Rochester, MN, United States.
DNA Identification Testing Division, Laboratory Corporation of America Holdings, Burlington, NC, United States.
Front Immunol. 2020 Oct 9;11:582927. doi: 10.3389/fimmu.2020.582927. eCollection 2020.
The homology, recombination, variation, and repetitive elements in the natural killer-cell immunoglobulin-like receptor (KIR) region has made full haplotype DNA interpretation impossible in a high-throughput workflow. Here, we present a new approach using long-read sequencing to efficiently capture, sequence, and assemble diploid human KIR haplotypes. Probes were designed to capture KIR fragments efficiently by leveraging the repeating homology of the region. IDT xGen Lockdown probes were used to capture 2-8 kb of sheared DNA fragments followed by sequencing on a PacBio Sequel. The sequences were error corrected, binned, and then assembled using the Canu assembler. The location of genes and their exon/intron boundaries are included in the workflow. The assembly and annotation was evaluated on 16 individuals (8 African American and 8 Europeans) from whom ground truth was known long-range sequencing with fosmid library preparation. Using only 18 capture probes, the results show that the assemblies cover 97% of the GenBank reference, are 99.97% concordant, and it takes only 1.8 haplotigs to cover 75% of the reference. We also report the first assembly of diploid KIR haplotypes from long-read WGS. Our targeted hybridization probe capture and sequencing approach is the first of its kind to fully sequence and phase all diploid human KIR haplotypes, and it is efficient enough for population-scale studies and clinical use. The open and free software is available at https://github.com/droeatumn/kass and supported by a environment at https://hub.docker.com/repository/docker/droeatumn/kass.
自然杀伤细胞免疫球蛋白样受体 (KIR) 区域中的同源性、重组、变异和重复元件使得在高通量工作流程中不可能对完整的单倍型 DNA 进行解释。在这里,我们提出了一种使用长读测序来有效捕获、测序和组装二倍体人类 KIR 单倍型的新方法。通过利用该区域的重复同源性,设计了探针以有效地捕获 KIR 片段。IDT xGen Lockdown 探针用于捕获 2-8 kb 的剪切 DNA 片段,然后在 PacBio Sequel 上进行测序。使用 Canu 组装器对序列进行纠错、分类,然后组装。该工作流程包括基因的位置及其外显子/内含子边界的注释。在 16 个人(8 名非裔美国人和 8 名欧洲人)上评估了组装和注释,这些人具有来自长距离测序和 fosmid 文库制备的地面真相。仅使用 18 个捕获探针,结果表明组装体覆盖了 GenBank 参考序列的 97%,一致性为 99.97%,仅用 1.8 个单倍型即可覆盖参考序列的 75%。我们还报告了首次使用长读 WGS 组装二倍体 KIR 单倍型。我们的靶向杂交探针捕获和测序方法是首例完全测序和定相所有二倍体人类 KIR 单倍型的方法,其效率足以用于群体研究和临床应用。开放和免费的软件可在 https://github.com/droeatumn/kass 上获得,并在 https://hub.docker.com/repository/docker/droeatumn/kass 上得到支持。