Institute of Medical Microbiology and Hospital Hygiene, Heinrich-Heine-University Düsseldorf, Düsseldorf, North Rhine-Westphalia, Germany.
Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, 20892, USA.
Nat Commun. 2019 Jul 11;10(1):3066. doi: 10.1038/s41467-019-10934-2.
Metagenomic sequence classification should be fast, accurate and information-rich. Emerging long-read sequencing technologies promise to improve the balance between these factors but most existing methods were designed for short reads. MetaMaps is a new method, specifically developed for long reads, capable of mapping a long-read metagenome to a comprehensive RefSeq database with >12,000 genomes in <16 GB or RAM on a laptop computer. Integrating approximate mapping with probabilistic scoring and EM-based estimation of sample composition, MetaMaps achieves >94% accuracy for species-level read assignment and r > 0.97 for the estimation of sample composition on both simulated and real data when the sample genomes or close relatives are present in the classification database. To address novel species and genera, which are comparatively harder to predict, MetaMaps outputs mapping locations and qualities for all classified reads, enabling functional studies (e.g. gene presence/absence) and detection of incongruities between sample and reference genomes.
宏基因组序列分类应快速、准确且信息丰富。新兴的长读测序技术有望改善这些因素之间的平衡,但大多数现有方法都是为短读长设计的。MetaMaps 是一种新的方法,专门为长读长设计,能够在笔记本电脑上的 <16GB 或 RAM 中,将长读长宏基因组映射到包含 >12000 个基因组的综合 RefSeq 数据库中。MetaMaps 将近似映射与概率评分和基于 EM 的样本组成估计相结合,在分类数据库中存在样本基因组或近亲时,实现了 >94%的物种级读分配准确率和 r > 0.97 的样本组成估计准确率。为了解决新型物种和属,这些物种和属相对较难预测,MetaMaps 为所有分类读长输出映射位置和质量,从而能够进行功能研究(例如基因存在/缺失)和检测样本与参考基因组之间的不一致性。