Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
Genome Biol. 2018 Nov 16;19(1):198. doi: 10.1186/s13059-018-1568-0.
False-positive identifications are a significant problem in metagenomics classification. We present KrakenUniq, a novel metagenomics classifier that combines the fast k-mer-based classification of Kraken with an efficient algorithm for assessing the coverage of unique k-mers found in each species in a dataset. On various test datasets, KrakenUniq gives better recall and precision than other methods and effectively classifies and distinguishes pathogens with low abundance from false positives in infectious disease samples. By using the probabilistic cardinality estimator HyperLogLog, KrakenUniq runs as fast as Kraken and requires little additional memory. KrakenUniq is freely available at https://github.com/fbreitwieser/krakenuniq .
假阳性鉴定是宏基因组分类中的一个重大问题。我们提出了 KrakenUniq,这是一种新的宏基因组分类器,它结合了 Kraken 基于快速 k-mer 的分类和一种有效的算法,用于评估在数据集的每个物种中发现的独特 k-mer 的覆盖度。在各种测试数据集上,KrakenUniq 的召回率和精度都优于其他方法,并且能够有效地对传染病样本中的低丰度病原体和假阳性进行分类和区分。通过使用概率基数估计器 HyperLogLog,KrakenUniq 的运行速度与 Kraken 一样快,并且只需要很少的额外内存。KrakenUniq 可在 https://github.com/fbreitwieser/krakenuniq 上免费获得。