van Hemert Formijn, Jebbink Maarten, van der Ark Andries, Scholer Frits, Berkhout Ben
Laboratory of Experimental Virology, Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, Netherlands.
Research Institute of Child Development and Education, University of Amsterdam, Amsterdam, Netherlands.
Comput Math Methods Med. 2018 Oct 30;2018:6490647. doi: 10.1155/2018/6490647. eCollection 2018.
Nucleotide skew analysis is a versatile method to study the nucleotide composition of RNA/DNA molecules, in particular to reveal characteristic sequence signatures. For instance, skew analysis of the nucleotide bias of several viral RNA genomes indicated that it is enriched in the unpaired, single-stranded genome regions, thus creating an even more striking virus-specific signature. The comparison of skew graphs for many virus isolates or families is difficult, time-consuming, and nonquantitative. Here, we present a procedure for a more simple identification of similarities and dissimilarities between nucleotide skew data of coronavirus, flavivirus, picornavirus, and HIV-1 RNA genomes. Window and step sizes were normalized to correct for differences in length of the viral genome. Cumulative skew data are converted into pairwise Euclidean distance matrices, which can be presented as neighbor-joining trees. We present skew value trees for the four virus families and show that closely related viruses are placed in small clusters. Importantly, the skew value trees are similar to the trees constructed by a "classical" model of evolutionary nucleotide substitution. Thus, we conclude that the simple calculation of Euclidean distances between nucleotide skew data allows an easy and quantitative comparison of characteristic sequence signatures of virus genomes. These results indicate that the Euclidean distance analysis of nucleotide skew data forms a nice addition to the virology toolbox.
核苷酸偏性分析是一种用于研究RNA/DNA分子核苷酸组成的通用方法,尤其有助于揭示特征性序列特征。例如,对几种病毒RNA基因组的核苷酸偏性进行偏性分析表明,其在未配对的单链基因组区域中富集,从而形成了更为显著的病毒特异性特征。对许多病毒分离株或病毒家族的偏性图进行比较既困难、耗时,又缺乏定量性。在此,我们提出了一种程序,可更简便地识别冠状病毒、黄病毒、小RNA病毒和HIV-1 RNA基因组的核苷酸偏性数据之间的异同。对窗口大小和步长进行归一化处理,以校正病毒基因组长度的差异。累积偏性数据被转换为成对的欧几里得距离矩阵,可将其呈现为邻接树。我们给出了这四个病毒家族的偏性值树,并表明亲缘关系相近的病毒被归为小簇。重要的是,偏性值树与通过“经典”进化核苷酸替换模型构建的树相似。因此,我们得出结论,对核苷酸偏性数据进行简单的欧几里得距离计算,就能轻松且定量地比较病毒基因组的特征性序列特征。这些结果表明,对核苷酸偏性数据进行欧几里得距离分析是病毒学工具箱中的一个很好补充。