Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore.
Plant Commun. 2020 Jun 24;1(4):100089. doi: 10.1016/j.xplc.2020.100089. eCollection 2020 Jul 13.
The nucleotide-binding domain and leucine-rich repeat (NLR) gene family is highly expanded in the plant lineage with extensive sequence and structure polymorphisms. To survey the landscape of NLR expansion, we mined the published long-read data generated by the resistance gene enrichment sequencing of 64 diverse accessions. We found that the hot spots of massive multi-gene NLR cluster expansion did not typically span the whole cluster; instead, they were restricted to a handful of, or only one, dominant radiation(s). All sequences in such a radiation were distinct from other genes in the cluster but not from each other in the clade, making it difficult to assign trustworthy reference-based orthologies when multiple reference genes were present in the radiation. Consequently, NLR genes can be broadly divided into two types: radiating or high-fidelity, where high-fidelity genes are well conserved and well separated from other clades. A similar distinction could be made for NLR clusters, depending on whether cluster size was determined primarily by extensive radiation or the presence of numerous high-fidelity genes. We also identified groups of well-conserved NLR clades that were missing from the Columbia-0 reference genome. This suggests that the classification of NLRs using gene IDs from a single reference accession can rarely capture all major paralogs in a cluster accurately and representatively and that a reference-agnostic perspective is required to properly characterize these additional variations. Finally, we present a quantitative visualization method for differentiating these situations in a given clade of interest.
核苷酸结合域和富含亮氨酸重复(NLR)基因家族在植物谱系中高度扩张,具有广泛的序列和结构多态性。为了调查 NLR 扩张的情况,我们挖掘了已发表的长读数据,这些数据是通过对 64 个不同品种的抗病基因富集测序生成的。我们发现,大量多基因 NLR 簇扩张的热点通常不跨越整个簇;相反,它们仅限于少数或只有一个主导辐射。这样一个辐射中的所有序列都与簇中的其他基因不同,但在进化枝中彼此相似,因此当一个辐射中存在多个参考基因时,很难分配可信的基于参考的直系同源物。因此,NLR 基因可以分为两种类型:辐射型或高保真型,其中高保真基因高度保守,与其他进化枝很好地区分开来。NLR 簇也可以根据簇大小主要是通过广泛的辐射还是存在大量高保真基因来进行类似的区分。我们还发现了一些在哥伦比亚-0 参考基因组中缺失的 NLR 进化枝。这表明,使用单个参考基因 ID 对 NLR 进行分类很少能够准确和代表性地捕获簇中的所有主要旁系同源物,并且需要一种无参考的观点来正确描述这些额外的变异。最后,我们提出了一种定量可视化方法,用于区分给定感兴趣的进化枝中的这些情况。