• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MetageNN:一种内存高效的神经网络分类器,可稳健应对测序错误和缺失基因组。

MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes.

机构信息

School of Computing, National University of Singapore, Singapore, 117417, Republic of Singapore.

Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), Singapore, 138672, Republic of Singapore.

出版信息

BMC Bioinformatics. 2024 Apr 16;25(Suppl 1):153. doi: 10.1186/s12859-024-05760-3.

DOI:10.1186/s12859-024-05760-3
PMID:38627615
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11022314/
Abstract

BACKGROUND

With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database.

RESULTS

We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes. MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. Benchmarking MetageNN against other machine learning approaches for taxonomic classification (GeNet) showed substantial improvements with long-read data (20% improvement in F1 score). By utilizing nanopore sequencing data, MetageNN exhibits improved sensitivity in situations where the reference database is incomplete. It surpasses the alignment-based MetaMaps and MEGAN-LR, as well as the k-mer-based Kraken2 tools, with improvements of 100%, 36%, and 23% respectively at the read-level analysis. Notably, at the community level, MetageNN consistently demonstrated higher sensitivities than the previously mentioned tools. Furthermore, MetageNN requires < 1/4th of the database storage used by Kraken2, MEGAN-LR and MMseqs2 and is > 7× faster than MetaMaps and GeNet and > 2× faster than MEGAN-LR and MMseqs2.

CONCLUSION

This proof of concept work demonstrates the utility of machine-learning-based methods for taxonomic classification using long reads. MetageNN can be used on sequences not classified by conventional methods and offers an alternative approach for memory-efficient classifiers that can be optimized further.

摘要

背景

随着长读测序技术通量的快速增加,最近的研究探索了基于比对的方法在分类学中的应用潜力,以减少较高测序错误率的影响。虽然基于比对的方法通常较慢,但基于 k-mer 的分类器可以克服这一限制,但其代价是对数据库中不存在的菌株和物种的敏感性降低。

结果

我们提出了 MetageNN,这是一种内存高效的长读分类器,对测序错误和缺失基因组具有鲁棒性。MetageNN 是一种神经网络模型,它使用序列的短 k-mer 轮廓来减少分布偏移对易错长读的影响。将 MetageNN 与其他用于分类学的机器学习方法(GeNet)进行基准测试表明,长读数据的性能有了实质性的提高(F1 得分提高了 20%)。通过利用纳米孔测序数据,MetageNN 在参考数据库不完整的情况下表现出更高的敏感性。与基于比对的 MetaMaps 和 MEGAN-LR 以及基于 k-mer 的 Kraken2 工具相比,MetageNN 在读取水平分析上分别提高了 100%、36%和 23%。值得注意的是,在群落水平上,MetageNN 始终表现出比上述工具更高的敏感性。此外,MetageNN 所需的数据库存储空间小于 Kraken2、MEGAN-LR 和 MMseqs2 的 1/4,比 MetaMaps 和 GeNet 快 7 倍以上,比 MEGAN-LR 和 MMseqs2 快 2 倍以上。

结论

这项概念验证工作证明了基于机器学习的方法在长读分类学中的应用潜力。MetageNN 可以用于传统方法无法分类的序列,并提供了一种替代方法,用于进一步优化内存高效的分类器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f5/11022314/fd5511bb5908/12859_2024_5760_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f5/11022314/6ce7d05544e3/12859_2024_5760_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f5/11022314/e3ef90534272/12859_2024_5760_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f5/11022314/38fb0df4e92a/12859_2024_5760_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f5/11022314/853a3346330c/12859_2024_5760_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f5/11022314/fd5511bb5908/12859_2024_5760_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f5/11022314/6ce7d05544e3/12859_2024_5760_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f5/11022314/e3ef90534272/12859_2024_5760_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f5/11022314/38fb0df4e92a/12859_2024_5760_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f5/11022314/853a3346330c/12859_2024_5760_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5f5/11022314/fd5511bb5908/12859_2024_5760_Fig5_HTML.jpg

相似文献

1
MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes.MetageNN:一种内存高效的神经网络分类器,可稳健应对测序错误和缺失基因组。
BMC Bioinformatics. 2024 Apr 16;25(Suppl 1):153. doi: 10.1186/s12859-024-05760-3.
2
Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets.评价长读 shotgun 宏基因组测序数据集的分类和分析方法。
BMC Bioinformatics. 2022 Dec 13;23(1):541. doi: 10.1186/s12859-022-05103-0.
3
MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs.MEGAN-LR:新算法允许对宏基因组长读段和 contigs 进行准确的分箱和轻松的交互式探索。
Biol Direct. 2018 Apr 20;13(1):6. doi: 10.1186/s13062-018-0208-7.
4
MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.环境宏基因组的MinION™纳米孔测序:一种合成方法。
Gigascience. 2017 Mar 1;6(3):1-10. doi: 10.1093/gigascience/gix007.
5
Reference-Free Plant Disease Detection Using Machine Learning and Long-Read Metagenomic Sequencing.基于机器学习和长读长测序的免参考植物病害检测
Appl Environ Microbiol. 2023 Jun 28;89(6):e0026023. doi: 10.1128/aem.00260-23. Epub 2023 May 15.
6
Large-scale machine learning for metagenomics sequence classification.用于宏基因组学序列分类的大规模机器学习
Bioinformatics. 2016 Apr 1;32(7):1023-32. doi: 10.1093/bioinformatics/btv683. Epub 2015 Nov 20.
7
Taxometer: Improving taxonomic classification of metagenomics contigs.Taxometer:提高宏基因组序列的分类学分类。
Nat Commun. 2024 Sep 27;15(1):8357. doi: 10.1038/s41467-024-52771-y.
8
Benchmarking taxonomic classifiers with Illumina and Nanopore sequence data for clinical metagenomic diagnostic applications.使用 Illumina 和 Nanopore 测序数据对临床宏基因组诊断应用进行分类器的基准测试。
Microb Genom. 2022 Oct;8(10). doi: 10.1099/mgen.0.000886.
9
Deep learning models for bacteria taxonomic classification of metagenomic data.基于深度学习的宏基因组数据细菌分类学分类模型
BMC Bioinformatics. 2018 Jul 9;19(Suppl 7):198. doi: 10.1186/s12859-018-2182-6.
10
Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps.使用 MetaMaps 对长读进行菌株水平宏基因组分配和组成估计。
Nat Commun. 2019 Jul 11;10(1):3066. doi: 10.1038/s41467-019-10934-2.

引用本文的文献

1
Challenges and Opportunities in Analyzing Cancer-Associated Microbiomes.分析癌症相关微生物群的挑战与机遇
Cancer Res. 2025 Aug 12. doi: 10.1158/0008-5472.CAN-24-3629.
2
Evaluation of the taxonomic classification tools and visualizers for metagenomic analysis using the Oxford nanopore sequence database.使用牛津纳米孔序列数据库对宏基因组分析的分类学分类工具和可视化工具进行评估。
J Appl Genet. 2025 Mar 29. doi: 10.1007/s13353-025-00962-8.
3
Lightweight taxonomic profiling of long-read metagenomic datasets with Lemur and Magnet.使用狐猴(Lemur)和磁体(Magnet)对长读长宏基因组数据集进行轻量级分类分析。

本文引用的文献

1
Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets.评价长读 shotgun 宏基因组测序数据集的分类和分析方法。
BMC Bioinformatics. 2022 Dec 13;23(1):541. doi: 10.1186/s12859-022-05103-0.
2
Nanopore sequencing technology, bioinformatics and applications.纳米孔测序技术、生物信息学及其应用。
Nat Biotechnol. 2021 Nov;39(11):1348-1365. doi: 10.1038/s41587-021-01108-x. Epub 2021 Nov 8.
3
Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy.
bioRxiv. 2024 Aug 25:2024.06.01.596961. doi: 10.1101/2024.06.01.596961.
大规模基于 k-mer 的基因组信息特性分析、比较基因组学和分类学。
PLoS One. 2021 Oct 14;16(10):e0258693. doi: 10.1371/journal.pone.0258693. eCollection 2021.
4
Fast and sensitive taxonomic assignment to metagenomic contigs.快速而敏感的宏基因组序列分类学分配。
Bioinformatics. 2021 Sep 29;37(18):3029-3031. doi: 10.1093/bioinformatics/btab184.
5
DeepMicrobes: taxonomic classification for metagenomics with deep learning.深度微生物:用于宏基因组学的深度学习分类法
NAR Genom Bioinform. 2020 Feb 19;2(1):lqaa009. doi: 10.1093/nargab/lqaa009. eCollection 2020 Mar.
6
metaFlye: scalable long-read metagenome assembly using repeat graphs.metaFlye:使用重复图进行可扩展的长读长宏基因组组装。
Nat Methods. 2020 Nov;17(11):1103-1110. doi: 10.1038/s41592-020-00971-x. Epub 2020 Oct 5.
7
Benchmarking the MinION: Evaluating long reads for microbial profiling.MinION 基准测试:评估微生物分析的长读长片段。
Sci Rep. 2020 Mar 20;10(1):5125. doi: 10.1038/s41598-020-61989-x.
8
Opportunities and challenges in long-read sequencing data analysis.长读测序数据分析中的机遇与挑战。
Genome Biol. 2020 Feb 7;21(1):30. doi: 10.1186/s13059-020-1935-5.
9
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.
10
Benchmarking Metagenomics Tools for Taxonomic Classification.基于元基因组工具的分类学基准测试。
Cell. 2019 Aug 8;178(4):779-794. doi: 10.1016/j.cell.2019.07.010.