Vergni Davide, Santoni Daniele
Istituto per le Applicazioni del Calcolo "Mauro Picone" - CNR, Via dei Taurini 19, 00185, Rome, Italy.
Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti" - CNR, Via dei Taurini 19, 00185, Rome, Italy.
PLoS One. 2016 Dec 1;11(12):e0164540. doi: 10.1371/journal.pone.0164540. eCollection 2016.
A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications.
零聚物是一种在给定DNA序列中不作为子序列出现的寡聚物,即它是该序列中不存在的单词。零聚物在从药物发现到法医实践等多种应用中的重要性,目前在文献中存在争议。在这里,我们研究了零聚物的性质,即它们在基因组中的缺失是仅仅有统计学解释,还是基因组序列的一个特殊特征。我们引入了零聚物概念的一个扩展,即高阶零聚物,它们是其突变序列仍然是零聚物的零聚物。我们研究了它们的不同方面:与随机序列的零聚物比较、CpG分布和平均螺旋上升。与先前的结果一致,我们发现人类基因组中的零聚物数量比随机预期的要多得多。然而,在考虑保留二核苷酸频率的随机DNA序列时,发现了相反的结果。对零聚物和高阶零聚物中CpG频率的分析表明,正如预期的那样,CpG含量很高,但也突出了CpG频率对二核苷酸位置的强烈依赖性,这表明零聚物有其自身独特的结构,而不仅仅是CpG频率有偏差的序列。此外,基于二核苷酸频率的相似性以及两个物种共有的零聚物数量,构建了11个物种的系统发育树,表明零聚物在亲缘关系较近的物种中相当保守。最后,对零聚物序列平均螺旋上升的研究揭示了显著较高的平均上升值,强化了这些序列具有一些特殊结构特征的假设。所获得的结果表明,零聚物是DNA特殊结构(也包括有偏差的CpG频率和CpG岛)的结果,因此,即使考虑到CpG岛,超突变模型似乎也不足以解释零聚物现象。最后,高阶零聚物可能会突出那些已经使简单零聚物在多种应用中有用的特征。