CRISPR 间隔区主要由种间特异性转座子元件序列组成。
The CRISPR Spacer Space Is Dominated by Sequences from Species-Specific Mobilomes.
机构信息
Skolkovo Institute of Science and Technology, Skolkovo, Russian Federation.
National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA.
出版信息
mBio. 2017 Sep 19;8(5):e01397-17. doi: 10.1128/mBio.01397-17.
Clustered regularly interspaced short palindromic repeats and CRISPR-associated protein (CRISPR-Cas) systems store the memory of past encounters with foreign DNA in unique spacers that are inserted between direct repeats in CRISPR arrays. For only a small fraction of the spacers, homologous sequences, called protospacers, are detectable in viral, plasmid, and microbial genomes. The rest of the spacers remain the CRISPR "dark matter." We performed a comprehensive analysis of the spacers from all CRISPR- loci identified in bacterial and archaeal genomes, and we found that, depending on the CRISPR-Cas subtype and the prokaryotic phylum, protospacers were detectable for 1% to about 19% of the spacers (~7% global average). Among the detected protospacers, the majority, typically 80 to 90%, originated from viral genomes, including proviruses, and among the rest, the most common source was genes that are integrated into microbial chromosomes but are involved in plasmid conjugation or replication. Thus, almost all spacers with identifiable protospacers target mobile genetic elements (MGE). The GC content, as well as dinucleotide and tetranucleotide compositions, of microbial genomes, their spacer complements, and the cognate viral genomes showed a nearly perfect correlation and were almost identical. Given the near absence of self-targeting spacers, these findings are most compatible with the possibility that the spacers, including the dark matter, are derived almost completely from the species-specific microbial mobilomes. The principal function of CRISPR-Cas systems is thought to be protection of bacteria and archaea against viruses and other parasitic genetic elements. The CRISPR defense function is mediated by sequences from parasitic elements, known as spacers, that are inserted into CRISPR arrays and then transcribed and employed as guides to identify and inactivate the cognate parasitic genomes. However, only a small fraction of the CRISPR spacers match any sequences in the current databases, and of these, only a minority correspond to known parasitic elements. We show that nearly all spacers with matches originate from viral or plasmid genomes that are either free or have been integrated into the host genome. We further demonstrate that spacers with no matches have the same properties as those of identifiable origins, strongly suggesting that all spacers originate from mobile elements.
成簇规律间隔短回文重复序列和 CRISPR 相关蛋白 (CRISPR-Cas) 系统将过去与外源 DNA 相遇的记忆存储在 CRISPR 阵列中直接重复之间插入的独特间隔子中。只有一小部分间隔子,称为原间隔子,可在病毒、质粒和微生物基因组中检测到同源序列。其余的间隔子仍然是 CRISPR“暗物质”。我们对从细菌和古菌基因组中鉴定的所有 CRISPR 基因座的间隔子进行了全面分析,结果发现,根据 CRISPR-Cas 亚型和原核生物门,原间隔子可检测到约 1%至 19%的间隔子(~7%为全球平均值)。在所检测到的原间隔子中,大多数(通常为 80%至 90%)来源于病毒基因组,包括前病毒,其余部分中最常见的来源是整合到微生物染色体中但参与质粒接合或复制的基因。因此,几乎所有可识别原间隔子的间隔子都靶向移动遗传元件 (MGE)。微生物基因组、其间隔子补充物和同源病毒基因组的 GC 含量、二核苷酸和四核苷酸组成几乎完全相关且几乎相同。鉴于自我靶向间隔子的几乎不存在,这些发现最符合这样一种可能性,即间隔子(包括暗物质)几乎完全来自物种特异性微生物可移动组。CRISPR-Cas 系统的主要功能被认为是保护细菌和古菌免受病毒和其他寄生遗传元件的侵害。CRISPR 防御功能是由寄生元件序列介导的,这些序列称为间隔子,它们插入到 CRISPR 阵列中,然后转录并用作识别和失活同源寄生基因组的向导。然而,只有一小部分 CRISPR 间隔子与当前数据库中的任何序列匹配,而其中只有少数与已知的寄生元件相对应。我们表明,几乎所有与匹配的间隔子都源自游离或已整合到宿主基因组中的病毒或质粒基因组。我们进一步证明,没有匹配的间隔子具有与可识别起源相同的特性,这强烈表明所有间隔子都源自移动元件。