Suppr超能文献

大规模基于 k-mer 的基因组信息特性分析、比较基因组学和分类学。

Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy.

机构信息

Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel.

Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel.

出版信息

PLoS One. 2021 Oct 14;16(10):e0258693. doi: 10.1371/journal.pone.0258693. eCollection 2021.

Abstract

Information theoretic approaches are ubiquitous and effective in a wide variety of bioinformatics applications. In comparative genomics, alignment-free methods, based on short DNA words, or k-mers, are particularly powerful. We evaluated the utility of varying k-mer lengths for genome comparisons by analyzing their sequence space coverage of 5805 genomes in the KEGG GENOME database. In subsequent analyses on four k-mer lengths spanning the relevant range (11, 21, 31, 41), hierarchical clustering of 1634 genus-level representative genomes using pairwise 21- and 31-mer Jaccard similarities best recapitulated a phylogenetic/taxonomic tree of life with clear boundaries for superkingdom domains and high subtree similarity for named taxons at lower levels (family through phylum). By analyzing ~14.2M prokaryotic genome comparisons by their lowest-common-ancestor taxon levels, we detected many potential misclassification errors in a curated database, further demonstrating the need for wide-scale adoption of quantitative taxonomic classifications based on whole-genome similarity.

摘要

信息论方法在各种生物信息学应用中无处不在且非常有效。在比较基因组学中,基于短 DNA 单词或 k-mer 的无比对方法特别强大。我们通过分析 KEGG GENOME 数据库中 5805 个基因组的序列空间覆盖范围,评估了不同 k-mer 长度在基因组比较中的应用。在对跨越相关范围的四个 k-mer 长度(11、21、31 和 41)的后续分析中,使用成对的 21 和 31-mer Jaccard 相似性对 1634 个属水平代表基因组进行层次聚类,最好地再现了具有明确超界域边界的系统发育/分类树,并且在较低级别(从科到门)的命名分类群中具有较高的子树相似性。通过对其最低共同祖先分类群水平的约 1420 万个原核基因组进行分析,我们在一个经过精心整理的数据库中检测到许多潜在的错误分类错误,进一步证明需要广泛采用基于全基因组相似性的定量分类学分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f3c8/8516232/00f14e0e2752/pone.0258693.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验