KGCAK：一个基于K-mer的全基因组系统发育和复杂性评估数据库。

KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

作者信息

Wang Dapeng, Xu Jiayue, Yu Jun

机构信息

CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, PR China.

Stem Cell Laboratory, UCL Cancer Institute, University College London, London, WC1E 6BT, UK.

出版信息

Biol Direct. 2015 Sep 16;10:53. doi: 10.1186/s13062-015-0083-4.

DOI:10.1186/s13062-015-0083-4

PMID:26376976

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4573299/

Abstract

BACKGROUND

The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison.

RESULTS

To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution.

CONCLUSION

We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.

摘要

背景

K-mer方法将基因组序列视为简单字符，并计算固定K值下每个字符串的相对丰度，已广泛应用于基因组组装、注释和比较的系统发育推断。

结果

为满足比较大型基因组序列的不断增长的需求，并促进K-mer方法的应用，我们开发了一个通用数据库KGCAK（http://kgcak.big.ac.cn/KGCAK/），其中包含约8000个基因组，涵盖了多种生命形式（病毒、原核生物、原生生物、动物和植物）的基因组序列以及真核生物谱系的细胞器。它以无比对的方式基于基因组元件构建系统发育，并提供深入的数据处理功能，使用户能够根据K-mer分布比较基因组序列的复杂性。

结论

我们希望KGCAK成为基于基因组数据探索生命之树中物种组内和组间关系的强大工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d89b/4573299/fe42de48e706/13062_2015_83_Fig1_HTML.jpg

相似文献

KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

Biol Direct. 2015 Sep 16;10:53. doi: 10.1186/s13062-015-0083-4.

Genome classification improvements based on k-mer intervals in sequences.

Genomics. 2019 Dec;111(6):1574-1582. doi: 10.1016/j.ygeno.2018.11.001. Epub 2018 Nov 13.

BPhyOG: an interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes.

BMC Bioinformatics. 2007 Jul 25;8:266. doi: 10.1186/1471-2105-8-266.

A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes.

BMC Genomics. 2008 Oct 31;9:517. doi: 10.1186/1471-2164-9-517.

kmer2vec: A Novel Method for Comparing DNA Sequences by word2vec Embedding.

J Comput Biol. 2022 Sep;29(9):1001-1021. doi: 10.1089/cmb.2021.0536. Epub 2022 May 20.

A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm.

BMC Bioinformatics. 2008 Oct 7;9:419. doi: 10.1186/1471-2105-9-419.

Statistically Consistent k-mer Methods for Phylogenetic Tree Reconstruction.

J Comput Biol. 2017 Feb;24(2):153-171. doi: 10.1089/cmb.2015.0216. Epub 2016 Jul 7.

Analysis of common k-mers for whole genome sequences using SSB-tree.

Genome Inform. 2002;13:30-41.

Beyond linear sequence comparisons: the use of genome-level characters for phylogenetic reconstruction.

Philos Trans R Soc Lond B Biol Sci. 2008 Apr 27;363(1496):1445-51. doi: 10.1098/rstb.2007.2234.

KINN: An alignment-free accurate phylogeny reconstruction method based on inner distance distributions of k-mer pairs in biological sequences.

Mol Phylogenet Evol. 2023 Feb;179:107662. doi: 10.1016/j.ympev.2022.107662. Epub 2022 Nov 11.

引用本文的文献

Distribution rules of 8-mer spectra and characterization of evolution state in animal genome sequences.

BMC Genomics. 2024 Sep 12;25(1):855. doi: 10.1186/s12864-024-10786-1.

Intrinsic laws of k-mer spectra of genome sequences and evolution mechanism of genomes.

BMC Evol Biol. 2020 Nov 23;20(1):157. doi: 10.1186/s12862-020-01723-3.

riboSeed: leveraging prokaryotic genomic architecture to assemble across ribosomal regions.

Nucleic Acids Res. 2018 Jun 20;46(11):e68. doi: 10.1093/nar/gky212.

Informational laws of genome structures.

Sci Rep. 2016 Jun 29;6:28840. doi: 10.1038/srep28840.

本文引用的文献

Inferring phylogenies of evolving sequences without multiple sequence alignment.

Sci Rep. 2014 Sep 30;4:6504. doi: 10.1038/srep06504.

Informed and automated k-mer size selection for genome assembly.

Bioinformatics. 2014 Jan 1;30(1):31-7. doi: 10.1093/bioinformatics/btt310. Epub 2013 Jun 3.

AGP: a multimethods web server for alignment-free genome phylogeny.

Mol Biol Evol. 2013 May;30(5):1032-7. doi: 10.1093/molbev/mst021. Epub 2013 Feb 6.

DECOD: fast and accurate discriminative DNA motif finding.

Bioinformatics. 2011 Sep 1;27(17):2361-7. doi: 10.1093/bioinformatics/btr412. Epub 2011 Jul 12.

A fast, lock-free approach for efficient parallel counting of occurrences of k-mers.

Bioinformatics. 2011 Mar 15;27(6):764-70. doi: 10.1093/bioinformatics/btr011. Epub 2011 Jan 7.

The calculation of information and organismal complexity.

Biol Direct. 2010 Oct 12;5:59. doi: 10.1186/1745-6150-5-59.

The Newick utilities: high-throughput phylogenetic tree processing in the UNIX shell.

Bioinformatics. 2010 Jul 1;26(13):1669-70. doi: 10.1093/bioinformatics/btq243. Epub 2010 May 13.

Genomic DNA k-mer spectra: models and modalities.

Genome Biol. 2009;10(10):R108. doi: 10.1186/gb-2009-10-10-r108. Epub 2009 Oct 8.

A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes.

BMC Genomics. 2008 Oct 31;9:517. doi: 10.1186/1471-2164-9-517.

CVTree: a phylogenetic tree reconstruction tool based on whole genomes.

Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W45-7. doi: 10.1093/nar/gkh362.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

KGCAK：一个基于K-mer的全基因组系统发育和复杂性评估数据库。

KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

作者信息

Wang Dapeng, Xu Jiayue, Yu Jun

机构信息

CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, PR China.

Stem Cell Laboratory, UCL Cancer Institute, University College London, London, WC1E 6BT, UK.

出版信息

Biol Direct. 2015 Sep 16;10:53. doi: 10.1186/s13062-015-0083-4.

DOI:10.1186/s13062-015-0083-4

PMID:26376976

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4573299/

Abstract

BACKGROUND

RESULTS

CONCLUSION

We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.

摘要

背景

K-mer方法将基因组序列视为简单字符，并计算固定K值下每个字符串的相对丰度，已广泛应用于基因组组装、注释和比较的系统发育推断。

结果

结论

我们希望KGCAK成为基于基因组数据探索生命之树中物种组内和组间关系的强大工具。

KGCAK：一个基于K-mer的全基因组系统发育和复杂性评估数据库。

KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

KGCAK：一个基于K-mer的全基因组系统发育和复杂性评估数据库。

KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献