Pavlova Yekaterina S, Paez-Espino David, Morozov Andrew Yu, Belalov Ilya S
Mathematics Department, Palomar College, San Marcos, California, United States of America.
Department of Energy, Joint Genome Institute, Walnut Creek, California, United States of America.
PLoS Comput Biol. 2021 Mar 26;17(3):e1008841. doi: 10.1371/journal.pcbi.1008841. eCollection 2021 Mar.
Understanding CRISPR-Cas systems-the adaptive defence mechanism that about half of bacterial species and most of archaea use to neutralise viral attacks-is important for explaining the biodiversity observed in the microbial world as well as for editing animal and plant genomes effectively. The CRISPR-Cas system learns from previous viral infections and integrates small pieces from phage genomes called spacers into the microbial genome. The resulting library of spacers collected in CRISPR arrays is then compared with the DNA of potential invaders. One of the most intriguing and least well understood questions about CRISPR-Cas systems is the distribution of spacers across the microbial population. Here, using empirical data, we show that the global distribution of spacer numbers in CRISPR arrays across multiple biomes worldwide typically exhibits scale-invariant power law behaviour, and the standard deviation is greater than the sample mean. We develop a mathematical model of spacer loss and acquisition dynamics which fits observed data from almost four thousand metagenomes well. In analogy to the classical 'rich-get-richer' mechanism of power law emergence, the rate of spacer acquisition is proportional to the CRISPR array size, which allows a small proportion of CRISPRs within the population to possess a significant number of spacers. Our study provides an alternative explanation for the rarity of all-resistant super microbes in nature and why proliferation of phages can be highly successful despite the effectiveness of CRISPR-Cas systems.
了解CRISPR-Cas系统(约一半细菌物种和大多数古菌用于抵御病毒攻击的适应性防御机制)对于解释微生物世界中观察到的生物多样性以及有效编辑动植物基因组至关重要。CRISPR-Cas系统从先前的病毒感染中学习,并将来自噬菌体基因组的小片段(称为间隔序列)整合到微生物基因组中。然后将CRISPR阵列中收集的间隔序列文库与潜在入侵者的DNA进行比较。关于CRISPR-Cas系统,最有趣且了解最少的问题之一是间隔序列在微生物群体中的分布情况。在这里,我们利用实证数据表明,全球多个生物群落中CRISPR阵列中间隔序列数量的分布通常呈现出尺度不变的幂律行为,且标准差大于样本均值。我们开发了一个间隔序列丢失和获取动态的数学模型,该模型能很好地拟合来自近四千个宏基因组的观测数据。类似于幂律出现的经典“富者愈富”机制,间隔序列的获取速率与CRISPR阵列大小成正比,这使得群体中一小部分CRISPR拥有大量间隔序列。我们的研究为自然界中全抗性超级微生物的稀有性以及尽管CRISPR-Cas系统有效但噬菌体仍能高度成功增殖的原因提供了另一种解释。