Toms Alice, Barrangou Rodolphe
Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA.
Center for Integrated Fungal Research, Department of Entomology and Plant Pathology, North Carolina State University, Raleigh, NC, 27695, USA.
Biol Direct. 2017 Aug 29;12(1):20. doi: 10.1186/s13062-017-0193-2.
Much effort is underway to build and upgrade databases and tools related to occurrence, diversity, and characterization of CRISPR-Cas systems. As microbial communities and their genome complements are unearthed, much emphasis has been placed on details of individual strains and model systems within the CRISPR-Cas classification, and that collection of information as a whole affords the opportunity to analyze CRISPR-Cas systems from a quantitative perspective to gain insight into distribution of CRISPR array sizes across the different classes, types and subtypes. CRISPR diversity, nomenclature, occurrence, and biological functions have generated a plethora of data that created a need to understand the size and distribution of these various systems to appreciate their features and complexity.
By utilizing a statistical framework and visual analytic techniques, we have been able to test several hypotheses about CRISPR loci in bacterial class I systems. Quantitatively, though CRISPR loci can expand to hundreds of spacers, the mean and median sizes are 40 and 25, respectively, reflecting rather modest acquisition and/or retention overall. Histograms uncovered that CRISPR array size displayed a parametric distribution, which was confirmed by a goodness-of fit test. Mapping the frequency of CRISPR loci on a standardized chromosome plot revealed that CRISPRs have a higher probability of occurring at clustered locations along the positive or negative strand. Lastly, when multiple arrays occur in a particular system, the size of a particular CRISPR array varies with its distance from the cas operon, reflecting acquisition and expansion biases.
This study establishes that bacterial Class I CRISPR array size tends to follow a geometric distribution; these CRISPRs are not randomly distributed along the chromosome; and the CRISPR array closest to the cas genes is typically larger than loci in trans. Overall, we provide an analytical framework to understand the features and behavior of CRISPR-Cas systems through a quantitative lens.
This article was reviewed by Eugene Koonin (NIH-NCBI) and Uri Gophna (Tel Aviv University).
目前正在大力构建和升级与CRISPR-Cas系统的发生、多样性和特征相关的数据库及工具。随着微生物群落及其基因组补充信息的发掘,CRISPR-Cas分类中对单个菌株和模型系统的细节给予了高度重视,而对这些信息的整体收集为从定量角度分析CRISPR-Cas系统提供了机会,以便深入了解不同类别、类型和亚型中CRISPR阵列大小的分布情况。CRISPR的多样性、命名、发生情况及生物学功能产生了大量数据,这就需要了解这些不同系统的大小和分布,以认识它们的特征和复杂性。
通过运用统计框架和可视化分析技术,我们得以对细菌I类系统中的CRISPR基因座进行了若干假设检验。从数量上看,尽管CRISPR基因座可扩展至数百个间隔序列,但平均大小和中位数大小分别为40和25,这总体上反映出获取和/或保留的程度较为适度。直方图显示CRISPR阵列大小呈现参数分布,这通过拟合优度检验得到了证实。在标准化染色体图上绘制CRISPR基因座的频率表明,CRISPR在正链或负链上的聚集位置出现的概率更高。最后,当特定系统中存在多个阵列时,特定CRISPR阵列大小会随其与cas操纵子的距离而变化,这反映了获取和扩展的偏向性。
本研究表明细菌I类CRISPR阵列大小倾向于遵循几何分布;这些CRISPR并非随机分布在染色体上;且最靠近cas基因的CRISPR阵列通常比反式位点更大。总体而言,我们提供了一个分析框架,以便通过定量视角来理解CRISPR-Cas系统的特征和行为。
本文由尤金·库宁(美国国立卫生研究院 - 国家生物技术信息中心)和乌里·戈夫纳(特拉维夫大学)评审。