• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

应用人工神经网络后校正下一代测序数据中病毒分类群分布的估计。

Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks.

机构信息

Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, D-30559 Hannover, Germany.

出版信息

Genes (Basel). 2021 Oct 31;12(11):1755. doi: 10.3390/genes12111755.

DOI:10.3390/genes12111755
PMID:34828361
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8624964/
Abstract

Estimating the taxonomic composition of viral sequences in a biological samples processed by next-generation sequencing is an important step in comparative metagenomics. Mapping sequencing reads against a database of known viral reference genomes, however, fails to classify reads from novel viruses whose reference sequences are not yet available in public databases. Instead of a mapping approach, and in order to classify sequencing reads at least to a taxonomic level, the performance of artificial neural networks and other machine learning models was studied. Taxonomic and genomic data from the NCBI database were used to sample labelled sequencing reads as training data. The fitted neural network was applied to classify unlabelled reads of simulated and real-world test sets. Additional auxiliary test sets of labelled reads were used to estimate the conditional class probabilities, and to correct the prior estimation of the taxonomic distribution in the actual test set. Among the taxonomic levels, the biological order of viruses provided the most comprehensive data base to generate training data. The prediction accuracy of the artificial neural network to classify test reads to their viral order was considerably higher than that of a random classification. Posterior estimation of taxa frequencies could correct the primary classification results.

摘要

估算下一代测序处理的生物样本中病毒序列的分类组成是比较宏基因组学的重要步骤。然而,将测序读段映射到已知病毒参考基因组数据库上,无法对尚未在公共数据库中提供参考序列的新型病毒的读段进行分类。为了至少对分类级别进行测序读段分类,而不是采用映射方法,研究了人工神经网络和其他机器学习模型的性能。从 NCBI 数据库中获取分类和基因组数据,将标记的测序读段作为训练数据进行采样。将拟合的神经网络应用于模拟和真实世界测试集的未标记读段分类。使用额外的标记读段辅助测试集来估计条件类概率,并纠正实际测试集中的分类分布的先验估计。在分类级别中,病毒的生物目为生成训练数据提供了最全面的数据库。人工神经网络对病毒目进行分类的测试读段的预测准确性明显高于随机分类。类别的后验估计可以纠正主要的分类结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/a2db39ea58e8/genes-12-01755-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/23a2b154e3d3/genes-12-01755-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/46cf4aea97d7/genes-12-01755-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/6e1b7ef0541b/genes-12-01755-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/0d5d568f7b72/genes-12-01755-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/967c41acc097/genes-12-01755-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/165aad923080/genes-12-01755-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/a2db39ea58e8/genes-12-01755-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/23a2b154e3d3/genes-12-01755-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/46cf4aea97d7/genes-12-01755-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/6e1b7ef0541b/genes-12-01755-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/0d5d568f7b72/genes-12-01755-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/967c41acc097/genes-12-01755-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/165aad923080/genes-12-01755-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe1f/8624964/a2db39ea58e8/genes-12-01755-g007.jpg

相似文献

1
Correcting the Estimation of Viral Taxa Distributions in Next-Generation Sequencing Data after Applying Artificial Neural Networks.应用人工神经网络后校正下一代测序数据中病毒分类群分布的估计。
Genes (Basel). 2021 Oct 31;12(11):1755. doi: 10.3390/genes12111755.
2
Comparison of different assembly and annotation tools on analysis of simulated viral metagenomic communities in the gut.比较不同的组装和注释工具在分析肠道中模拟病毒宏基因组群落中的应用。
BMC Genomics. 2014 Jan 18;15:37. doi: 10.1186/1471-2164-15-37.
3
Cataloguing the taxonomic origins of sequences from a heterogeneous sample using phylogenomics: applications in adventitious agent detection.利用系统发育基因组学对异质样本中序列的分类学起源进行编目:在检测外来因子中的应用。
PDA J Pharm Sci Technol. 2014 Nov-Dec;68(6):602-18. doi: 10.5731/pdajpst.2014.01023.
4
Machine Learning for detection of viral sequences in human metagenomic datasets.基于机器学习的人类宏基因组数据中病毒序列检测
BMC Bioinformatics. 2018 Sep 24;19(1):336. doi: 10.1186/s12859-018-2340-x.
5
-mer-Based Metagenomics Tools Provide a Fast and Sensitive Approach for the Detection of Viral Contaminants in Biopharmaceutical and Vaccine Manufacturing Applications Using Next-Generation Sequencing.基于宏基因组学的工具采用下一代测序技术,为生物制药和疫苗生产应用中病毒污染物的检测提供了一种快速、灵敏的方法。
mSphere. 2021 Apr 21;6(2):e01336-20. doi: 10.1128/mSphere.01336-20.
6
Virus detection in high-throughput sequencing data without a reference genome of the host.在没有宿主参考基因组的高通量测序数据中进行病毒检测。
Infect Genet Evol. 2018 Dec;66:180-187. doi: 10.1016/j.meegid.2018.09.026. Epub 2018 Oct 3.
7
MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples.MiCoP:一种用于检测宏基因组样本中病毒和真菌生物的微生物群落分析方法。
BMC Genomics. 2019 Jun 6;20(Suppl 5):423. doi: 10.1186/s12864-019-5699-9.
8
A statistical framework for accurate taxonomic assignment of metagenomic sequencing reads.一种用于宏基因组测序reads 精确分类学分配的统计框架。
PLoS One. 2012;7(10):e46450. doi: 10.1371/journal.pone.0046450. Epub 2012 Oct 1.
9
Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。
Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.
10
Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data.基于机器学习和数据库的方法在高通量测序数据分类中的应用与比较。
Genome Biol Evol. 2024 May 2;16(5). doi: 10.1093/gbe/evae102.

引用本文的文献

1
A review of neural networks for metagenomic binning.宏基因组分箱的神经网络综述。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf065.

本文引用的文献

1
Interpretable detection of novel human viruses from genome sequencing data.从基因组测序数据中对新型人类病毒进行可解释的检测。
NAR Genom Bioinform. 2021 Feb 1;3(1):lqab004. doi: 10.1093/nargab/lqab004. eCollection 2021 Mar.
2
NetCoMi: network construction and comparison for microbiome data in R.NetCoMi:用于微生物组数据的网络构建和比较的 R 包。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa290.
3
Illuminating an Ecological Blackbox: Using High Throughput Sequencing to Characterize the Plant Virome Across Scales.
揭示一个生态黑箱:利用高通量测序技术跨尺度表征植物病毒组
Front Microbiol. 2020 Oct 16;11:578064. doi: 10.3389/fmicb.2020.578064. eCollection 2020.
4
Measuring reproducibility of virus metagenomics analyses using bootstrap samples from FASTQ-files.使用 FASTQ 文件中的自举样本测量病毒宏基因组分析的可重复性。
Bioinformatics. 2021 May 23;37(8):1068-1075. doi: 10.1093/bioinformatics/btaa926.
5
NCBI Taxonomy: a comprehensive update on curation, resources and tools.NCBI 分类学:在管理、资源和工具方面的全面更新。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa062.
6
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.
7
DAMIAN: an open source bioinformatics tool for fast, systematic and cohort based analysis of microorganisms in diagnostic samples.DAMIAN:一个用于快速、系统和基于队列分析诊断样本中微生物的开源生物信息学工具。
Sci Rep. 2019 Nov 14;9(1):16841. doi: 10.1038/s41598-019-52881-4.
8
Rapid identification of human-infecting viruses.快速鉴定人感染病毒
Transbound Emerg Dis. 2019 Nov;66(6):2517-2522. doi: 10.1111/tbed.13314. Epub 2019 Aug 12.
9
Changes to virus taxonomy and the International Code of Virus Classification and Nomenclature ratified by the International Committee on Taxonomy of Viruses (2019).病毒分类学和国际病毒分类与命名法规的变更,经国际病毒分类委员会批准(2019 年)。
Arch Virol. 2019 Sep;164(9):2417-2429. doi: 10.1007/s00705-019-04306-w.
10
Evolution of the Large Nucleocytoplasmic DNA Viruses of Eukaryotes and Convergent Origins of Viral Gigantism.真核生物大核质 DNA 病毒的进化与病毒巨型化的趋同起源。
Adv Virus Res. 2019;103:167-202. doi: 10.1016/bs.aivir.2018.09.002. Epub 2018 Nov 10.