MetageNN：一种内存高效的神经网络分类器，可稳健应对测序错误和缺失基因组。

MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes.

机构信息

School of Computing, National University of Singapore, Singapore, 117417, Republic of Singapore.

Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), Singapore, 138672, Republic of Singapore.

出版信息

BMC Bioinformatics. 2024 Apr 16;25(Suppl 1):153. doi: 10.1186/s12859-024-05760-3.

DOI:10.1186/s12859-024-05760-3

PMID:38627615

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11022314/

Abstract

BACKGROUND

With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database.

RESULTS

We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes. MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. Benchmarking MetageNN against other machine learning approaches for taxonomic classification (GeNet) showed substantial improvements with long-read data (20% improvement in F1 score). By utilizing nanopore sequencing data, MetageNN exhibits improved sensitivity in situations where the reference database is incomplete. It surpasses the alignment-based MetaMaps and MEGAN-LR, as well as the k-mer-based Kraken2 tools, with improvements of 100%, 36%, and 23% respectively at the read-level analysis. Notably, at the community level, MetageNN consistently demonstrated higher sensitivities than the previously mentioned tools. Furthermore, MetageNN requires < 1/4th of the database storage used by Kraken2, MEGAN-LR and MMseqs2 and is > 7× faster than MetaMaps and GeNet and > 2× faster than MEGAN-LR and MMseqs2.

CONCLUSION

This proof of concept work demonstrates the utility of machine-learning-based methods for taxonomic classification using long reads. MetageNN can be used on sequences not classified by conventional methods and offers an alternative approach for memory-efficient classifiers that can be optimized further.

摘要

背景

随着长读测序技术通量的快速增加，最近的研究探索了基于比对的方法在分类学中的应用潜力，以减少较高测序错误率的影响。虽然基于比对的方法通常较慢，但基于 k-mer 的分类器可以克服这一限制，但其代价是对数据库中不存在的菌株和物种的敏感性降低。

结果

我们提出了 MetageNN，这是一种内存高效的长读分类器，对测序错误和缺失基因组具有鲁棒性。MetageNN 是一种神经网络模型，它使用序列的短 k-mer 轮廓来减少分布偏移对易错长读的影响。将 MetageNN 与其他用于分类学的机器学习方法（GeNet）进行基准测试表明，长读数据的性能有了实质性的提高（F1 得分提高了 20%）。通过利用纳米孔测序数据，MetageNN 在参考数据库不完整的情况下表现出更高的敏感性。与基于比对的 MetaMaps 和 MEGAN-LR 以及基于 k-mer 的 Kraken2 工具相比，MetageNN 在读取水平分析上分别提高了 100%、36%和 23%。值得注意的是，在群落水平上，MetageNN 始终表现出比上述工具更高的敏感性。此外，MetageNN 所需的数据库存储空间小于 Kraken2、MEGAN-LR 和 MMseqs2 的 1/4，比 MetaMaps 和 GeNet 快 7 倍以上，比 MEGAN-LR 和 MMseqs2 快 2 倍以上。

结论

这项概念验证工作证明了基于机器学习的方法在长读分类学中的应用潜力。MetageNN 可以用于传统方法无法分类的序列，并提供了一种替代方法，用于进一步优化内存高效的分类器。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

MetageNN：一种内存高效的神经网络分类器，可稳健应对测序错误和缺失基因组。

MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

MetageNN：一种内存高效的神经网络分类器，可稳健应对测序错误和缺失基因组。

MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献