Suppr超能文献

基于熵的散度测度在符号序列分析中的推广。

Generalization of entropy based divergence measures for symbolic sequence analysis.

机构信息

Departamento de Ciencias Básicas, CIII - Facultad Regional Córdoba, Universidad Tecnológica Nacional, Córdoba, Argentina; Facultad de Matemática, Astronomía y Física, Universidad Nacional de Córdoba, Córdoba, Argentina.

Department of Biological Sciences, University of North Texas, Denton, Texas, United States of America; Department of Mathematics, University of North Texas, Denton, Texas, United States of America.

出版信息

PLoS One. 2014 Apr 11;9(4):e93532. doi: 10.1371/journal.pone.0093532. eCollection 2014.

Abstract

Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.

摘要

基于熵的测度在符号序列分析中被广泛应用。一种对称化和平滑化的形式的 Kullback-Leibler 散度或相对熵,Jensen-Shannon 散度(JSD),由于其与其他散度测度家族的共享属性及其在统计物理、信息论和数理统计等不同领域的可解释性而受到特别关注。这个测度的独特性和多功能性源于其多个属性,包括推广到任意数量的概率分布以及为分布分配权重。此外,其熵形式允许在不同的统计框架中进行推广,例如非广延的 Tsallis 统计和高阶马尔可夫统计。我们重新审视了这些推广,并在集成的 Tsallis 和马尔可夫统计框架中提出了 JSD 的新推广。我们表明,这种推广可以用互信息来解释。我们还研究了不同 JSD 推广在分解来自细菌基因组组装的嵌合 DNA 序列方面的性能,包括大肠杆菌、伤寒沙门氏菌、鼠疫耶尔森氏菌和流感嗜血杆菌。我们的结果表明,当比较的序列来自亲缘关系较近的生物体时,JSD 推广会带来更显著的改进,而这些生物体由于其组成相似性往往难以区分。虽然 Tsallis 统计 JSD 推广观察到了较小但明显的改进,而马尔可夫推广则观察到了相对较大的改进。相比之下,与 Tsallis 和 Markovian 推广相比,提出的 Tsallis-Markovian 推广在比较来自亲缘关系较近的生物体的序列时产生了更显著的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3543/3984095/167d7aa9c8ff/pone.0093532.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验