National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA; Rutgers, State University of New Jersey, Piscataway, New Jersey, USA.
Skolkovo Institute of Science and Technology, Skolkovo, Russia; Rutgers, State University of New Jersey, Piscataway, New Jersey, USA.
CRISPR J. 2020 Dec;3(6):535-549. doi: 10.1089/crispr.2020.0062.
CRISPR-Cas systems typically consist of a CRISPR array and genes that are organized in one or more operons. However, a substantial fraction of CRISPR arrays are not adjacent to genes. Definitive identification of such isolated CRISPR arrays runs into the problem of false-positives, with unrelated types of repetitive sequences mimicking CRISPR. We developed a computational pipeline to eliminate false CRISPR predictions and found that up to 25% of the CRISPR arrays in complete bacterial and archaeal genomes are located away from genes. Most of the repeats in these isolated arrays are identical to repeats in -adjacent CRISPR arrays in the same or closely related genomes, indicating an evolutionary relationship between isolated arrays and arrays in typical CRISPR- loci. The spacers in isolated CRISPR arrays show nearly as many matches to viral genomes as spacers from complete CRISPR- loci, suggesting that the isolated arrays were either functionally active recently or continue to function. Reconstruction of evolutionary events in closely related bacterial genomes suggests three routes of evolution of isolated CRISPR arrays: (1) loss of genes in a CRISPR- locus, (2) generation of arrays from off-target spacer integration into sequences resembling the corresponding repeats, and (3) transfer by mobile genetic elements. Both combination of emerging arrays with genes and regain of genes by isolated arrays via recombination likely contribute to functional diversification in CRISPR-Cas evolution.
CRISPR-Cas 系统通常由一个 CRISPR 数组和基因组成,这些基因组织在一个或多个操纵子中。然而,相当一部分的 CRISPR 数组不与基因相邻。明确识别这种孤立的 CRISPR 数组会遇到假阳性的问题,即与 CRISPR 相似的无关类型的重复序列会被模拟。我们开发了一种计算管道来消除虚假的 CRISPR 预测,发现完整的细菌和古细菌基因组中多达 25%的 CRISPR 数组位于基因之外。这些孤立数组中的大多数重复与同一或密切相关基因组中相邻 CRISPR 数组中的重复相同,表明孤立数组与典型 CRISPR 基因座中的数组之间存在进化关系。孤立 CRISPR 数组中的间隔与病毒基因组的匹配数量几乎与完整 CRISPR 基因座中的间隔一样多,这表明这些孤立数组要么最近具有功能活性,要么继续发挥作用。在密切相关的细菌基因组中重建进化事件表明,孤立 CRISPR 数组有三种进化途径:(1)CRISPR 基因座中的基因丢失,(2)来自与相应重复相似的序列的脱靶间隔体整合生成数组,以及(3)通过移动遗传元件转移。新兴数组与基因的组合以及孤立数组通过重组重新获得基因可能有助于 CRISPR-Cas 进化中的功能多样化。