Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, 143025 Skolkovo, Russia.
National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894.
Proc Natl Acad Sci U S A. 2018 Jun 5;115(23):E5307-E5316. doi: 10.1073/pnas.1803440115. Epub 2018 May 21.
The CRISPR-Cas systems of bacterial and archaeal adaptive immunity consist of direct repeat arrays separated by unique spacers and multiple CRISPR-associated () genes encoding proteins that mediate all stages of the CRISPR response. In addition to the relatively small set of core genes that are typically present in all CRISPR-Cas systems of a given (sub)type and are essential for the defense function, numerous genes occur in CRISPR- loci only sporadically. Some of these have been shown to perform various ancillary roles in CRISPR response, but the functional relevance of most remains unknown. We developed a computational strategy for systematically detecting genes that are likely to be functionally linked to CRISPR-Cas. The approach is based on a "CRISPRicity" metric that measures the strength of CRISPR association for all protein-coding genes from sequenced bacterial and archaeal genomes. Uncharacterized genes with CRISPRicity values comparable to those of genes are considered candidate CRISPR-linked genes. We describe additional criteria to predict functionally relevance for genes in the candidate set and identify 79 genes as strong candidates for functional association with CRISPR-Cas systems. A substantial majority of these CRISPR-linked genes reside in type III CRISPR- loci, which implies exceptional functional versatility of type III systems. Numerous candidate CRISPR-linked genes encode integral membrane proteins suggestive of tight membrane association of CRISPR-Cas systems, whereas many others encode proteins implicated in various signal transduction pathways. These predictions provide ample material for improving annotation of CRISPR- loci and experimental characterization of previously unsuspected aspects of CRISPR-Cas system functionality.
CRISPR-Cas 系统是细菌和古菌适应性免疫系统的一部分,由直接重复序列组成,序列之间由独特的间隔区隔开,还有多个 CRISPR 相关(CRISPR-associated,Cas)基因,这些基因编码的蛋白参与 CRISPR 反应的各个阶段。除了在特定(亚)型的所有 CRISPR-Cas 系统中通常存在的相对较小的核心 Cas 基因集,这些基因对于防御功能是必不可少的之外,CRISPR 基因座中还会偶尔出现许多其他基因。其中一些基因已被证明在 CRISPR 反应中具有各种辅助作用,但大多数基因的功能相关性仍然未知。我们开发了一种用于系统检测可能与 CRISPR-Cas 功能相关的基因的计算策略。该方法基于“CRISPRicity”度量,该度量可衡量来自已测序细菌和古菌基因组的所有编码蛋白基因与 CRISPR 的关联强度。CRISPRicity 值与 Cas 基因相当的未鉴定基因被认为是候选 CRISPR 相关基因。我们还描述了用于预测候选基因集功能相关性的其他标准,并确定了 79 个基因作为与 CRISPR-Cas 系统功能关联的强候选基因。这些 CRISPR 相关基因的绝大多数位于 III 型 CRISPR 基因座中,这意味着 III 型系统具有特殊的功能多样性。许多候选 CRISPR 相关基因编码整合膜蛋白,提示 CRISPR-Cas 系统与膜紧密相关,而许多其他基因编码与各种信号转导途径相关的蛋白。这些预测为改进 CRISPR 基因座的注释和对以前未被怀疑的 CRISPR-Cas 系统功能方面进行实验表征提供了丰富的材料。