Bioinformatics and Computational Biology, University of Minnesota, Rochester, MN, United States.
Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, United States.
Front Immunol. 2020 Nov 26;11:583013. doi: 10.3389/fimmu.2020.583013. eCollection 2020.
The killer-cell immunoglobulin-like receptor (KIR) proteins evolve to fight viruses and mediate the body's reaction to pregnancy. These roles provide selection pressure for variation at both the structural/haplotype and base/allele levels. At the same time, the genes have evolved relatively recently by tandem duplication and therefore exhibit very high sequence similarity over thousands of bases. These variation-homology patterns make it impossible to interpret KIR haplotypes from abundant short-read genome sequencing data at population scale using existing methods. Here, we developed an efficient computational approach for KIR probe interpretation (KPI) to accurately interpret individual's KIR genes and haplotype-pairs from KIR sequencing reads. We designed synthetic 25-base sequence probes by analyzing previously reported haplotype sequences, and we developed a bioinformatics pipeline to interpret the probes in the context of 16 KIR genes and 16 haplotype structures. We demonstrated its accuracy on a synthetic data set as well as a real whole genome sequences from 748 individuals from The Genome of the Netherlands (GoNL). The GoNL predictions were compared with predictions from SNP-based predictions. Our results show 100% accuracy rate for the synthetic tests and a 99.6% family-consistency rate in the GoNL tests. Agreement with the SNP-based calls on KIR genes ranges from 72%-100% with a mean of 92%; most differences occur in genes , , , and where KPI predicts presence and the SNP-based interpretation predicts absence. Overall, the evidence suggests that KPI's accuracy is 97% or greater for both KIR gene and haplotype-pair predictions, and the presence/absence genotyping leads to ambiguous haplotype-pair predictions with 16 reference KIR haplotype structures. KPI is free, open, and easily executable as a Nextflow workflow supported by a Docker environment at https://github.com/droeatumn/kpi.
杀伤细胞免疫球蛋白样受体(KIR)蛋白进化以对抗病毒并介导机体对妊娠的反应。这些作用为结构/单倍型和碱基/等位基因水平的变异提供了选择压力。同时,这些基因通过串联重复相对较新进化而来,因此在数千个碱基上表现出非常高的序列相似性。这些变异-同源模式使得使用现有的方法无法从丰富的短读长基因组测序数据中在群体水平上解释 KIR 单倍型。在这里,我们开发了一种有效的 KIR 探针解释(KPI)计算方法,用于从 KIR 测序reads 中准确解释个体的 KIR 基因和单倍型对。我们通过分析先前报道的单倍型序列设计了合成的 25 碱基序列探针,并开发了一个生物信息学管道来解释 16 个 KIR 基因和 16 个单倍型结构中的探针。我们在合成数据集以及来自 748 名来自荷兰基因组(GoNL)的个体的真实全基因组序列上证明了其准确性。GoNL 的预测与基于 SNP 的预测进行了比较。我们的结果在合成测试中达到了 100%的准确率,在 GoNL 测试中达到了 99.6%的家族一致性率。与 SNP 调用的 KIR 基因的一致性范围为 72%-100%,平均值为 92%;大多数差异发生在基因 、 、 和 中,其中 KPI 预测存在而 SNP 基的解释预测不存在。总体而言,证据表明 KPI 对 KIR 基因和单倍型对的预测准确率达到 97%或更高,并且存在/不存在基因分型导致 16 个参考 KIR 单倍型结构的模糊单倍型对预测。KPI 是免费的、开放的,并且可以作为一个 Nextflow 工作流程轻松执行,该流程在 https://github.com/droeatumn/kpi 上得到了 Docker 环境的支持。