Ranasinghe Weerakkody, Gillette Dorcie, Ho Alexis, Cho Hyuk, Choudhary Madhusudan
Department of Biological Sciences, Sam Houston State University, Huntsville, TX, USA.
Department of Surgery, The University of Iowa Hospitals and Clinics, Iowa City, IA, USA.
Bioinform Biol Insights. 2024 Oct 5;18:11779322241274961. doi: 10.1177/11779322241274961. eCollection 2024.
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) is a naturally occurring genetic defense system in bacteria and archaea. It is comprised of a series of DNA sequence repeats with spacers derived from previous exposures to plasmid or phage. Further understanding and applications of CRISPR system have revolutionized our capacity for gene or genome editing of prokaryotes and eukaryotes. The CRISPR systems are classified into 3 distinct types: type I, type II, and type III, each of which possesses an associated signature protein, Cas3, Cas9, and Cas10, respectively. As the CRISPR loci originated from earlier independent exposures of foreign genetic elements, it is likely that their associated signature proteins may have evolved rapidly. Also, their functional domain structures might have experienced different selective pressures, and therefore, they have differentially diverged in their amino acid sequences. We employed genomic, phylogenetic, and structure-function constraint analyses to reveal the evolutionary distribution, phylogenetic relationship, and structure-function constraints of Cas3, Cas9, and Cas10 proteins. Results reveal that all 3 Cas-associated proteins are highly represented in the phyla , , and , including both pathogenic and non-pathogenic species. Genomic analysis of homologous proteins demonstrates that the proteins share 30% to 50% amino acid identity; therefore, they are low to moderately conserved and evolved rapidly. Phylogenetic analysis shows that 3 proteins originated monophyletically; however, the evolution rates were different among different branches of the clades. Furthermore, structure-function constraint analysis reveals that both Cas3 and Cas9 proteins experiences low to moderate levels of negative selection, and several protein domains of Cas3 and Cas9 proteins are highly conserved. To the contrary, most protein domains of Cas10 proteins experience neutral or positive selection, which supports rapid genetic divergence and less structure-function constraints.
CRISPR(成簇规律间隔短回文重复序列)是细菌和古生菌中天然存在的一种基因防御系统。它由一系列DNA序列重复片段组成,这些重复片段带有源自先前接触过的质粒或噬菌体的间隔序列。对CRISPR系统的进一步理解和应用彻底改变了我们对原核生物和真核生物进行基因或基因组编辑的能力。CRISPR系统分为3种不同类型:I型、II型和III型,每种类型分别拥有一种相关的标志性蛋白,即Cas3、Cas9和Cas10。由于CRISPR基因座起源于对外源遗传元件的早期独立接触,其相关的标志性蛋白可能进化得很快。此外,它们的功能域结构可能经历了不同的选择压力,因此,它们在氨基酸序列上有不同程度的分化。我们采用基因组、系统发育和结构-功能约束分析来揭示Cas3、Cas9和Cas10蛋白的进化分布、系统发育关系以及结构-功能约束。结果表明,所有这3种与Cas相关的蛋白在 、 和 门中都有高度表达,包括致病和非致病物种。对同源蛋白的基因组分析表明,这些蛋白的氨基酸同一性为30%至50%;因此,它们的保守程度较低到中等,进化迅速。系统发育分析表明,这3种蛋白单系起源;然而,在进化枝的不同分支中进化速率不同。此外,结构-功能约束分析表明,Cas3和Cas9蛋白都经历了低到中等程度的负选择,并且Cas3和Cas9蛋白的几个蛋白结构域高度保守。相反,Cas10蛋白的大多数蛋白结构域经历中性或正选择,这支持了快速的遗传分化和较少的结构-功能约束。