Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. He also serves as chief technology officer of HaploX Biotechnology. He is the initiator of OpenGene projects and a contributor to many open source tools.
department of bioinformatics, HaploX Biotechnology.
Brief Bioinform. 2021 Mar 22;22(2):924-935. doi: 10.1093/bib/bbaa231.
In this paper, we present a toolset and related resources for rapid identification of viruses and microorganisms from short-read or long-read sequencing data. We present fastv as an ultra-fast tool to detect microbial sequences present in sequencing data, identify target microorganisms and visualize coverage of microbial genomes. This tool is based on the k-mer mapping and extension method. K-mer sets are generated by UniqueKMER, another tool provided in this toolset. UniqueKMER can generate complete sets of unique k-mers for each genome within a large set of viral or microbial genomes. For convenience, unique k-mers for microorganisms and common viruses that afflict humans have been generated and are provided with the tools. As a lightweight tool, fastv accepts FASTQ data as input and directly outputs the results in both HTML and JSON formats. Prior to the k-mer analysis, fastv automatically performs adapter trimming, quality pruning, base correction and other preprocessing to ensure the accuracy of k-mer analysis. Specifically, fastv provides built-in support for rapid severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) identification and typing. Experimental results showed that fastv achieved 100% sensitivity and 100% specificity for detecting SARS-CoV-2 from sequencing data; and can distinguish SARS-CoV-2 from SARS, Middle East respiratory syndrome and other coronaviruses. This toolset is available at: https://github.com/OpenGene/fastv.
在本文中,我们提出了一个工具集和相关资源,用于从短读或长读测序数据中快速识别病毒和微生物。我们提出了 fastv 作为一种超快速的工具,用于检测测序数据中存在的微生物序列,识别目标微生物,并可视化微生物基因组的覆盖范围。该工具基于 k-mer 映射和扩展方法。k-mer 集由 UniqueKMER 生成,UniqueKMER 是该工具集中提供的另一个工具。UniqueKMER 可以为一组大型病毒或微生物基因组中的每个基因组生成完整的唯一 k-mer 集。为了方便起见,已经为微生物和常见的人类病原体生成了独特的 k-mer,并随工具提供。作为一个轻量级工具,fastv 接受 FASTQ 数据作为输入,并直接以 HTML 和 JSON 格式输出结果。在进行 k-mer 分析之前,fastv 自动执行适配器修剪、质量修剪、碱基校正和其他预处理,以确保 k-mer 分析的准确性。具体来说,fastv 为快速鉴定和分型严重急性呼吸综合征冠状病毒 2(SARS-CoV-2)提供了内置支持。实验结果表明,fastv 从测序数据中检测 SARS-CoV-2 的灵敏度达到 100%,特异性达到 100%;并且能够区分 SARS-CoV-2 与 SARS、中东呼吸综合征和其他冠状病毒。该工具集可在以下网址获得:https://github.com/OpenGene/fastv。