Ajay Akash, Begum Tina, Arya Ajay, Kumar Krishan, Ahmad Shandar
School of Environmental Sciences, Jawaharlal Nehru University, New Delhi 110067, India; School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India.
Comput Biol Chem. 2024 Oct;112:108107. doi: 10.1016/j.compbiolchem.2024.108107. Epub 2024 May 22.
Spontaneous mutations are evolutionary engines as they generate variants for the evolutionary downstream processes that give rise to speciation and adaptation. Single nucleotide mutations (SNM) are the most abundant type of mutations among them. Here, we perform a meta-analysis to quantify the influence of selected global genomic parameters (genome size, genomic GC content, genomic repeat fraction, number of coding genes, gene count, and strand bias in prokaryotes) and local genomic features (local GC content, repeat content, CpG content and the number of SNM at CpG islands) on spontaneous SNM rates across the tree of life (prokaryotes, unicellular eukaryotes, multicellular eukaryotes) using wild-type sequence data in two different taxon classification systems. We find that the spontaneous SNM rates in our data are correlated with many genomic features in prokaryotes and unicellular eukaryotes irrespective of their sample sizes. On the other hand, only the number of coding genes was correlated with the spontaneous SNM rates in multicellular eukaryotes primarily contributed by vertebrates data. Considering local features, we notice that local GC content and CpG content significantly were correlated with the spontaneous SNM rates in the unicellular eukaryotes, while local repeat fraction is an important feature in prokaryotes and certain specific uni- and multi-cellular eukaryotes. Such predictive features of the spontaneous SNM rates often support non-linear models as the best fit compared to the linear model. We also observe that the strand asymmetry in prokaryotes plays an important role in determining the spontaneous SNM rates but the SNM spectrum does not.
自发突变是进化的引擎,因为它们为导致物种形成和适应的进化下游过程产生变异。单核苷酸突变(SNM)是其中最丰富的突变类型。在这里,我们进行了一项荟萃分析,以量化选定的全局基因组参数(基因组大小、基因组GC含量、基因组重复分数、编码基因数量、基因计数以及原核生物中的链偏性)和局部基因组特征(局部GC含量、重复含量、CpG含量以及CpG岛处的SNM数量)对生命之树(原核生物、单细胞真核生物、多细胞真核生物)中自发SNM率的影响,使用了两种不同分类系统中的野生型序列数据。我们发现,无论样本大小如何,我们数据中的自发SNM率与原核生物和单细胞真核生物中的许多基因组特征相关。另一方面,在主要由脊椎动物数据贡献的多细胞真核生物中,只有编码基因的数量与自发SNM率相关。考虑局部特征时,我们注意到局部GC含量和CpG含量与单细胞真核生物中的自发SNM率显著相关,而局部重复分数是原核生物以及某些特定的单细胞和多细胞真核生物中的一个重要特征。与线性模型相比,自发SNM率的这些预测特征通常支持非线性模型为最佳拟合。我们还观察到,原核生物中的链不对称在确定自发SNM率方面起着重要作用,但SNM谱并非如此。