Maddamsetti Rohan, Shyti Irida, Wilson Maggie L, Son Hye-In, Baig Yasa, Zhou Zhengqing, Lu Jia, You Lingchong
Center for Quantitative Biodesign, Duke University, Durham, NC, USA.
Department of Biomedical Engineering, Duke University, Durham, NC, USA.
Nat Commun. 2025 Jul 2;16(1):6023. doi: 10.1038/s41467-025-61205-2.
The capacity of a plasmid to express genes is constrained by its length and copy number. However, the interplay between these parameters and their constraints on plasmid evolution have remained elusive due to the absence of comprehensive quantitative analyses. Here, we present 'Pseudoalignment and Probabilistic Iterative Read Assignment' (pseuPIRA), a computational method that overcomes previous computational bottlenecks, enabling rapid and accurate determination of plasmid copy numbers at large scale. We apply pseuPIRA to all microbial genomes in the NCBI RefSeq database with linked short-read sequencing data (4644 bacterial and archaeal genomes including 12,006 plasmids). The analysis reveals three scaling laws of plasmids: first, an inverse power-law correlation between plasmid copy number and plasmid length; second, a positive linear correlation between protein-coding genes and plasmid length; and third, a positive correlation between metabolic genes per plasmid and plasmid length, particularly for large plasmids. These scaling laws imply fundamental constraints on plasmid evolution and functional organization, indicating that as plasmids increase in length, they converge toward chromosomal characteristics in copy number and functional content.
质粒表达基因的能力受到其长度和拷贝数的限制。然而,由于缺乏全面的定量分析,这些参数之间的相互作用及其对质粒进化的限制仍然难以捉摸。在这里,我们提出了“伪比对和概率迭代读数分配”(pseuPIRA),这是一种计算方法,克服了以前的计算瓶颈,能够大规模快速准确地确定质粒拷贝数。我们将pseuPIRA应用于NCBI RefSeq数据库中所有具有关联短读测序数据的微生物基因组(4644个细菌和古菌基因组,包括12006个质粒)。分析揭示了质粒的三个缩放定律:第一,质粒拷贝数与质粒长度之间呈反幂律相关性;第二,蛋白质编码基因与质粒长度之间呈正线性相关性;第三,每个质粒的代谢基因与质粒长度之间呈正相关性,特别是对于大型质粒。这些缩放定律意味着对质粒进化和功能组织的基本限制,表明随着质粒长度的增加,它们在拷贝数和功能内容上趋向于染色体特征。