Wang Dapeng, Xu Jiayue, Yu Jun
CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, PR China.
Stem Cell Laboratory, UCL Cancer Institute, University College London, London, WC1E 6BT, UK.
Biol Direct. 2015 Sep 16;10:53. doi: 10.1186/s13062-015-0083-4.
The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison.
To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution.
We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.
K-mer方法将基因组序列视为简单字符,并计算固定K值下每个字符串的相对丰度,已广泛应用于基因组组装、注释和比较的系统发育推断。
为满足比较大型基因组序列的不断增长的需求,并促进K-mer方法的应用,我们开发了一个通用数据库KGCAK(http://kgcak.big.ac.cn/KGCAK/),其中包含约8000个基因组,涵盖了多种生命形式(病毒、原核生物、原生生物、动物和植物)的基因组序列以及真核生物谱系的细胞器。它以无比对的方式基于基因组元件构建系统发育,并提供深入的数据处理功能,使用户能够根据K-mer分布比较基因组序列的复杂性。
我们希望KGCAK成为基于基因组数据探索生命之树中物种组内和组间关系的强大工具。