Wang Ying, Chen Qi, Deng Chao, Zheng Yiluan, Sun Fengzhu
Department of Automation, Xiamen University, Xiamen, China.
Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision-Making, Xiamen, China.
Front Microbiol. 2020 Aug 25;11:2067. doi: 10.3389/fmicb.2020.02067. eCollection 2020.
Capturing group-specific sequences between two groups of genomic/metagenomic sequences is critical for the follow-up identifications of singular nucleotide variants (SNVs), gene families, microbial species or other elements associated with each group. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered a "group-specific" sequence in our study. We developed a user-friendly tool, KmerGO, to identify group-specific sequences between two groups of genomic/metagenomic long sequences or high-throughput sequencing datasets. Compared with other tools, KmerGO captures group-specific -mers ( up to 40 bps) with much lower requirements for computing resources in much shorter running time. For a 1.05 TB dataset (.fasta), it takes KmerGO about 21.5 h (including -mer counting) to return assembled group-specific sequences on a regular stand-alone workstation with no more than 1 GB memory. Furthermore, KmerGO can also be applied to capture trait-associated sequences for continuous-trait. Through multi-process parallel computing, KmerGO is implemented with both graphic user interface and command line on Linux and Windows free from any pre-installed supporting environments, packages, and complex configurations. The output group-specific -mers or sequences from KmerGO could be the inputs of other tools for the downstream discovery of biomarkers, such as genetic variants, species, or genes. KmerGO is available at https://github.com/ChnMasterOG/KmerGO.
Front Microbiol. 2020-8-25
Front Microbiol. 2018-5-3
IEEE/ACM Trans Comput Biol Bioinform. 2017-10-9
Bioinformatics. 2014-3-18
BMC Bioinformatics. 2017-10-16
Genome Inform. 2002
BMC Genomics. 2019-4-4
Bioinformatics. 2018-2-15
Comput Struct Biotechnol J. 2024-5-21
Microbiome Res Rep. 2023-7-20
Genes (Basel). 2023-7-13
Front Microbiol. 2022-10-6
Front Genet. 2021-1-18
Front Genet. 2019-11-21
iScience. 2019-8-30
Genome Biol. 2019-2-13
Bioinformatics. 2019-1-1
Elife. 2018-6-13
Front Microbiol. 2018-5-3
Res Comput Mol Biol. 2017
Bioinformatics. 2017-9-1