Suppr超能文献

使用 MiSeq 扩增子提高长读长基因组组装中高度重复的 MHC 基因的单倍型分辨率。

Improved haplotype resolution of highly duplicated MHC genes in a long-read genome assembly using MiSeq amplicons.

机构信息

Department of Biology, Molecular Ecology and Evolution Lab, Lund University, Lund, Sweden.

Department of Biology and Environmental Science, Faculty of Health and Life Sciences, Linnaeus University, Kalmar, Sweden.

出版信息

PeerJ. 2023 Jul 12;11:e15480. doi: 10.7717/peerj.15480. eCollection 2023.

Abstract

Long-read sequencing offers a great improvement in the assembly of complex genomic regions, such as the major histocompatibility complex (MHC) region, which can contain both tandemly duplicated MHC genes (paralogs) and high repeat content. The MHC genes have expanded in passerine birds, resulting in numerous MHC paralogs, with relatively high sequence similarity, making the assembly of the MHC region challenging even with long-read sequencing. In addition, MHC genes show rather high sequence divergence between alleles, making diploid-aware assemblers incorrectly classify haplotypes from the same locus as sequences originating from different genomic regions. Consequently, the number of MHC paralogs can easily be over- or underestimated in long-read assemblies. We therefore set out to verify the MHC diversity in an original and a haplotype-purged long-read assembly of one great reed warbler individual (the focal individual) by using Illumina MiSeq amplicon sequencing. Single exons, representing MHC class I (MHC-I) and class IIB (MHC-IIB) alleles, were sequenced in the focal individual and mapped to the annotated MHC alleles in the original long-read genome assembly. Eighty-four percent of the annotated MHC-I alleles in the original long-read genome assembly were detected using 55% of the amplicon alleles and likewise, 78% of the annotated MHC-IIB alleles were detected using 61% of the amplicon alleles, indicating an incomplete annotation of MHC genes. In the haploid genome assembly, each MHC-IIB gene should be represented by one allele. The parental origin of the MHC-IIB amplicon alleles in the focal individual was determined by sequencing MHC-IIB in its parents. Two of five larger scaffolds, containing 6-19 MHC-IIB paralogs, had a maternal and paternal origin, respectively, as well as a high nucleotide similarity, which suggests that these scaffolds had been incorrectly assigned as belonging to different loci in the genome rather than as alternate haplotypes of the same locus. Therefore, the number of MHC-IIB paralogs was overestimated in the haploid genome assembly. Based on our findings we propose amplicon sequencing as a suitable complement to long-read sequencing for independent validation of the number of paralogs in general and for haplotype inference in multigene families in particular.

摘要

长读测序技术在组装复杂基因组区域方面有了很大的改进,例如主要组织相容性复合体 (MHC) 区域,该区域既包含串联重复的 MHC 基因(等位基因),又包含高度重复的序列。MHC 基因在鸣禽中扩张,导致大量 MHC 等位基因,具有相对较高的序列相似性,即使使用长读测序技术,MHC 区域的组装也极具挑战性。此外,MHC 基因在等位基因之间表现出相当高的序列差异,使得二倍体感知组装器错误地将来自同一基因座的单倍型分类为来自不同基因组区域的序列。因此,MHC 等位基因在长读组装中很容易被高估或低估。因此,我们通过使用 Illumina MiSeq 扩增子测序,旨在验证一个大苇莺个体(焦点个体)原始和经过单倍体清除的长读组装中的 MHC 多样性。在焦点个体中,我们对代表 MHC 类 I(MHC-I)和类 IIB(MHC-IIB)等位基因的单个外显子进行测序,并将其映射到原始长读基因组组装中注释的 MHC 等位基因上。在原始长读基因组组装中,使用 55%的扩增子等位基因检测到 84%的注释 MHC-I 等位基因,同样,使用 61%的扩增子等位基因检测到 78%的注释 MHC-IIB 等位基因,表明 MHC 基因的注释不完整。在单倍体基因组组装中,每个 MHC-IIB 基因都应该由一个等位基因表示。通过对其父母的 MHC-IIB 进行测序,确定焦点个体中 MHC-IIB 扩增子等位基因的亲本来源。包含 6-19 个 MHC-IIB 等位基因的五个较大支架中的两个,分别具有母系和父系来源以及高度核苷酸相似性,这表明这些支架被错误地分配为属于基因组中不同的基因座,而不是同一基因座的不同单倍型。因此,在单倍体基因组组装中,MHC-IIB 等位基因的数量被高估了。基于我们的发现,我们建议扩增子测序作为长读测序的一种合适补充,用于一般验证等位基因数量和特别是多基因家族中单倍型推断的独立性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d266/10349553/2c4a3f45afa3/peerj-11-15480-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验