Fonseca I C, Nogueira E, Figueirêdo P H, Coutinho S
Departamento de Física, Universidade Federal da Paraíba, 58051-970, João Pessoa, PB, Brazil.
Departamento de Física, Universidade Federal Rural de Pernambuco, 52171-900, Recife, PE, Brazil.
Eur Phys J E Soft Matter. 2018 Jan 19;41(1):8. doi: 10.1140/epje/i2018-11609-8.
This article investigates aspects of similarity between complete sequences of mitochondrial DNA by determining the distribution of the relative frequencies of words with different lengths and the characteristics of their relevance throughout the sequences. The degree of similarity is obtained by comparing the distances between words contained within these sequences. Our results indicate that the best groupings among different species depend on the lengths of words and their respective relative frequencies. We also observed that the longer the word the more consistent the grouping between the sequences becomes. The application of our results, together with the perspective of analyzing DNA sequences belonging to a single biological species, may be important for the construction of phylogenetic trees, which are appropriate structures for understanding the evolutionary history of the species.
本文通过确定不同长度单词的相对频率分布及其在整个序列中的相关性特征,研究了线粒体DNA完整序列之间的相似性。相似程度是通过比较这些序列中所含单词之间的距离获得的。我们的结果表明,不同物种之间的最佳分组取决于单词的长度及其各自的相对频率。我们还观察到,单词越长,序列之间的分组就越一致。将我们的结果应用于分析属于单一生物物种的DNA序列,对于构建系统发育树可能很重要,系统发育树是理解物种进化历史的合适结构。