Suppr超能文献

KmerGO:一种用于通过k聚体识别特定群体序列的工具。

KmerGO: A Tool to Identify Group-Specific Sequences With -mers.

作者信息

Wang Ying, Chen Qi, Deng Chao, Zheng Yiluan, Sun Fengzhu

机构信息

Department of Automation, Xiamen University, Xiamen, China.

Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision-Making, Xiamen, China.

出版信息

Front Microbiol. 2020 Aug 25;11:2067. doi: 10.3389/fmicb.2020.02067. eCollection 2020.

Abstract

Capturing group-specific sequences between two groups of genomic/metagenomic sequences is critical for the follow-up identifications of singular nucleotide variants (SNVs), gene families, microbial species or other elements associated with each group. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered a "group-specific" sequence in our study. We developed a user-friendly tool, KmerGO, to identify group-specific sequences between two groups of genomic/metagenomic long sequences or high-throughput sequencing datasets. Compared with other tools, KmerGO captures group-specific -mers ( up to 40 bps) with much lower requirements for computing resources in much shorter running time. For a 1.05 TB dataset (.fasta), it takes KmerGO about 21.5 h (including -mer counting) to return assembled group-specific sequences on a regular stand-alone workstation with no more than 1 GB memory. Furthermore, KmerGO can also be applied to capture trait-associated sequences for continuous-trait. Through multi-process parallel computing, KmerGO is implemented with both graphic user interface and command line on Linux and Windows free from any pre-installed supporting environments, packages, and complex configurations. The output group-specific -mers or sequences from KmerGO could be the inputs of other tools for the downstream discovery of biomarkers, such as genetic variants, species, or genes. KmerGO is available at https://github.com/ChnMasterOG/KmerGO.

摘要

捕获两组基因组/宏基因组序列之间的组特异性序列对于后续识别单核苷酸变异(SNV)、基因家族、微生物物种或与每组相关的其他元素至关重要。在本研究中,在一组中存在或丰富而在另一组中不存在或稀少的序列被视为“组特异性”序列。我们开发了一个用户友好的工具KmerGO,用于识别两组基因组/宏基因组长序列或高通量测序数据集之间的组特异性序列。与其他工具相比,KmerGO以更低的计算资源需求和更短的运行时间捕获组特异性k-mer(最长40个碱基对)。对于一个1.05 TB的数据集(.fasta),在内存不超过1 GB的普通独立工作站上,KmerGO大约需要21.5小时(包括k-mer计数)来返回组装好的组特异性序列。此外,KmerGO还可用于捕获与连续性状相关的序列。通过多进程并行计算,KmerGO在Linux和Windows上通过图形用户界面和命令行实现,无需任何预先安装的支持环境、软件包和复杂配置。KmerGO输出的组特异性k-mer或序列可以作为其他工具的输入,用于下游生物标志物的发现,如基因变异、物种或基因。可在https://github.com/ChnMasterOG/KmerGO获取KmerGO。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9f7/7477287/75bd0f5ea971/fmicb-11-02067-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验