Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
Mol Microbiol. 2020 Jun;113(6):1209-1224. doi: 10.1111/mmi.14488. Epub 2020 Mar 17.
Candida glabratais an opportunistic pathogen in humans, responsible for approximately 20% of disseminated candidiasis. Candida glabrata's ability to adhere to host tissue is mediated by GPI-anchored cell wall proteins (GPI-CWPs); the corresponding genes contain long tandem repeat regions. These repeat regions resulted in assembly errors in the reference genome. Here, we performed a de novo assembly of the C. glabrata type strain CBS138 using long single-molecule real-time reads, with short read sequences (Illumina) for refinement, and constructed telomere-to-telomere assemblies of all 13 chromosomes. Our assembly has excellent agreement overall with the current reference genome, but we made substantial corrections within tandem repeat regions. Specifically, we removed 62 genes of which 45 were scrambled due to misassembly in the reference. We annotated 31 novel ORFs of which 24 ORFs are GPI-CWPs. In addition, we corrected the tandem repeat structure of an additional 21 genes. Our corrections to the genome were substantial, with the length of new genes and tandem repeat corrections amounting to approximately 3.8% of the ORFeome length. As most corrections were within the coding regions of GPI-CWP genes, our genome assembly establishes a high-quality reference set of genes and repeat structures for the functional analysis of these cell surface proteins.
光滑念珠菌是人类机会致病菌,约占播散性念珠菌病的 20%。光滑念珠菌黏附宿主组织的能力由 GPI-锚定细胞壁蛋白(GPI-CWPs)介导;相应的基因含有长串联重复区。这些重复区导致参考基因组组装错误。在这里,我们使用长单分子实时读取对 C. glabrata 标准株 CBS138 进行从头组装,使用短读序列(Illumina)进行细化,并构建了所有 13 条染色体的端粒到端粒组装。我们的组装与当前参考基因组总体上具有极好的一致性,但我们在串联重复区进行了大量修正。具体来说,我们删除了 62 个基因,其中 45 个由于参考基因组中的错误组装而混乱。我们注释了 31 个新的 ORF,其中 24 个 ORF 是 GPI-CWPs。此外,我们还修正了另外 21 个基因的串联重复结构。我们对基因组的修正相当大,新基因和串联重复修正的长度约占 ORFeome 长度的 3.8%。由于大多数修正都在 GPI-CWP 基因的编码区内,因此我们的基因组组装为这些细胞表面蛋白的功能分析建立了高质量的基因和重复结构参考集。