Suppr超能文献

长读测序辅助强毒分离株 ND886 的单倍型相位基因组组装揭示了效应子多态性和拷贝数变异。

Haplotype-Phased Genome Assembly of Virulent Isolate ND886 Facilitated by Long-Read Sequencing Reveals Effector Polymorphisms and Copy Number Variation.

机构信息

1Computational Genomics Lab, Structural Biology and Bioinformatics Division, CSIR Indian Institute of Chemical Biology, Kolkata, 700032, India.

2Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.

出版信息

Mol Plant Microbe Interact. 2019 Aug;32(8):1047-1060. doi: 10.1094/MPMI-08-18-0222-R. Epub 2019 Jun 10.

Abstract

is a destructive pathogen that causes sudden oak death disease. The genome sequence of isolate Pr102 was previously produced, using Sanger reads, and contained 12 Mb of gaps. However, isolate Pr102 had shown reduced aggressiveness and genome abnormalities. In order to produce an improved genome assembly for , we performed long-read sequencing of highly aggressive isolate CDFA1418886 (abbreviated as ND886). We generated a 60.5-Mb assembly of the ND886 genome using the Pacific Biosciences (PacBio) sequencing platform. The assembly includes 302 primary contigs (60.2 Mb) and nine unplaced contigs (265 kb). Additionally, we found a 'highly repetitive' component from the PacBio unassembled unmapped reads containing tandem repeats that are not part of the 60.5-Mb genome. The overall repeat content in the primary assembly was much higher than the Pr102 Sanger version (48 versus 29%), indicating that the long reads have captured repetitive regions effectively. The 302 primary contigs were phased into 345 haplotype blocks and 222,892 phased variants, of which the longest phased block was 1,513,201 bp with 7,265 phased variants. The improved phased assembly facilitated identification of 21 and 25 Crinkler effectors and 393 and 394 RXLR effector genes from two haplotypes. Of these, 24 and 25 RXLR effectors were newly predicted from haplotypes A and B, respectively. In addition, seven new paralogs of effector Avh207 were found in contig 54, not reported earlier. Comparison of the ND886 assembly with Pr102 V1 assembly suggests that several repeat-rich smaller scaffolds within the Pr102 V1 assembly were possibly misassembled; these regions are fully encompassed now in ND886 contigs. Our analysis further reveals that Pr102 is a heterokaryon with multiple nuclear types in the sequences corresponding to contig 10 of ND886 assembly.

摘要

是一种破坏性病原体,会导致突然的橡树死亡疾病。先前使用 Sanger 读取方法生产了 分离株 Pr102 的基因组序列,其中包含 12 Mb 的缺口。然而,分离株 Pr102 表现出侵袭性降低和基因组异常。为了为 生产改进的基因组组装,我们对高度侵袭性的 分离株 CDFA1418886(简称 ND886)进行了长读测序。我们使用 Pacific Biosciences(PacBio)测序平台生成了 ND886 基因组的 60.5-Mb 组装。组装包括 302 个初级 contigs(60.2 Mb)和 9 个未定位的 contigs(265 kb)。此外,我们从 PacBio 未组装未映射的读取中发现了一个“高度重复”的组件,其中包含串联重复序列,这些序列不是 60.5-Mb 基因组的一部分。主要组装中的总体重复含量远高于 Pr102 Sanger 版本(48%对 29%),这表明长读序列有效地捕获了重复区域。302 个初级 contigs 被相位到 345 个单倍型块和 222892 个相位变体中,其中最长的相位块为 1513201 bp,包含 7265 个相位变体。改进的相位组装促进了从两个单倍型中鉴定 21 个和 25 个 Crinkler 效应子以及 393 个和 394 个 RXLR 效应子基因。其中,24 个和 25 个 RXLR 效应子分别是从单倍型 A 和 B 中新预测的。此外,在 54 号 contig 中发现了效应子 Avh207 的七个新的旁系同源物,这在以前的报道中没有提到。ND886 组装与 Pr102 V1 组装的比较表明,Pr102 V1 组装中几个富含重复的较小支架可能被错误组装;这些区域现在完全包含在 ND886 contigs 中。我们的分析进一步表明,Pr102 是一个具有多个核型的异核体,在与 ND886 组装的 contig 10 对应的序列中。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验