宏基因组数据分析的功能分配：挑战与应用。

Functional assignment of metagenomic data: challenges and applications.

机构信息

Laboratory for MetaSystems Research at RIKEN, Japan.

出版信息

Brief Bioinform. 2012 Nov;13(6):711-27. doi: 10.1093/bib/bbs033. Epub 2012 Jul 6.

DOI:10.1093/bib/bbs033

PMID:22772835

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3504928/

Abstract

Metagenomic sequencing provides a unique opportunity to explore earth's limitless environments harboring scores of yet unknown and mostly unculturable microbes and other organisms. Functional analysis of the metagenomic data plays a central role in projects aiming to explore the most essential questions in microbiology, namely 'In a given environment, among the microbes present, what are they doing, and how are they doing it?' Toward this goal, several large-scale metagenomic projects have recently been conducted or are currently underway. Functional analysis of metagenomic data mainly suffers from the vast amount of data generated in these projects. The shear amount of data requires much computational time and storage space. These problems are compounded by other factors potentially affecting the functional analysis, including, sample preparation, sequencing method and average genome size of the metagenomic samples. In addition, the read-lengths generated during sequencing influence sequence assembly, gene prediction and subsequently the functional analysis. The level of confidence for functional predictions increases with increasing read-length. Usually, the most reliable functional annotations for metagenomic sequences are achieved using homology-based approaches against publicly available reference sequence databases. Here, we present an overview of the current state of functional analysis of metagenomic sequence data, bottlenecks frequently encountered and possible solutions in light of currently available resources and tools. Finally, we provide some examples of applications from recent metagenomic studies which have been successfully conducted in spite of the known difficulties.

摘要

宏基因组测序为探索地球上无数充满未知且大部分无法培养的微生物和其他生物的环境提供了独特的机会。对宏基因组数据的功能分析在旨在探索微生物学中最基本问题的项目中起着核心作用，即“在给定的环境中，存在哪些微生物，它们在做什么，以及它们是如何做的？”为了实现这一目标，最近已经或正在进行几个大型的宏基因组项目。宏基因组数据分析主要受到这些项目中生成的大量数据的影响。数据量之大需要大量的计算时间和存储空间。这些问题因其他潜在影响功能分析的因素而变得更加复杂，包括样本制备、测序方法和宏基因组样本的平均基因组大小。此外，测序过程中产生的读长会影响序列组装、基因预测，进而影响功能分析。随着读长的增加，功能预测的置信度也会增加。通常，使用基于同源性的方法对公共可用的参考序列数据库进行功能注释是获得最可靠的宏基因组序列功能注释的方法。在这里，我们概述了宏基因组序列数据功能分析的现状、经常遇到的瓶颈以及根据当前可用资源和工具提出的可能解决方案。最后，我们提供了一些成功进行的最近宏基因组研究的应用示例，尽管存在已知的困难，但这些示例都取得了成功。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a549/3504928/223482110ee7/bbs033f1.jpg

相似文献

Functional assignment of metagenomic data: challenges and applications.宏基因组数据分析的功能分配：挑战与应用。

Brief Bioinform. 2012 Nov;13(6):711-27. doi: 10.1093/bib/bbs033. Epub 2012 Jul 6.

Use of whole genome shotgun metagenomics: a practical guide for the microbiome-minded physician scientist.全基因组鸟枪法宏基因组学的应用：微生物组相关医师科学家的实用指南。

Semin Reprod Med. 2014 Jan;32(1):5-13. doi: 10.1055/s-0033-1361817. Epub 2014 Jan 3.

COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.认知器：宏基因组数据集功能注释框架

PLoS One. 2015 Nov 11;10(11):e0142102. doi: 10.1371/journal.pone.0142102. eCollection 2015.

Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes.通过验证的视角看宏基因组组装：评估和提高宏基因组组装基因组质量的最新进展。

Brief Bioinform. 2019 Jul 19;20(4):1140-1150. doi: 10.1093/bib/bbx098.

MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.环境宏基因组的MinION™纳米孔测序：一种合成方法。

Gigascience. 2017 Mar 1;6(3):1-10. doi: 10.1093/gigascience/gix007.

Gene prediction in metagenomic fragments: a large scale machine learning approach.宏基因组片段中的基因预测：一种大规模机器学习方法。

BMC Bioinformatics. 2008 Apr 28;9:217. doi: 10.1186/1471-2105-9-217.

Metagenomic Assembly: Overview, Challenges and Applications.宏基因组组装：概述、挑战与应用

Yale J Biol Med. 2016 Sep 30;89(3):353-362. eCollection 2016 Sep.

IDMIL: an alignment-free Interpretable Deep Multiple Instance Learning (MIL) for predicting disease from whole-metagenomic data.IDMIL：一种无对齐的可解释深度多重实例学习（MIL）方法，用于从全宏基因组数据预测疾病。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i39-i47. doi: 10.1093/bioinformatics/btaa477.

UltraSEQ, a Universal Bioinformatic Platform for Information-Based Clinical Metagenomics and Beyond.UltraSEQ，一个基于信息的临床宏基因组学及其他领域的通用生物信息学平台。

Microbiol Spectr. 2023 Jun 15;11(3):e0416022. doi: 10.1128/spectrum.04160-22. Epub 2023 Apr 11.

引用本文的文献

Fungal Identifier (FId): An Updated Polymerase Chain Reaction-Restriction Fragment Length Polymorphism Approach to Ease Ascomycetous Yeast Isolates' Identification in Ecological Studies.真菌鉴定器（FId）：一种更新的聚合酶链反应-限制性片段长度多态性方法，用于简化生态研究中子囊酵母菌分离株的鉴定。

J Fungi (Basel). 2024 Aug 23;10(9):595. doi: 10.3390/jof10090595.

Multi-omics approaches to studying gastrointestinal microbiome in the context of precision medicine and machine learning.在精准医学和机器学习背景下研究胃肠道微生物组的多组学方法。

Front Mol Biosci. 2024 Jan 19;10:1337373. doi: 10.3389/fmolb.2023.1337373. eCollection 2023.

A laboratory ice machine as a cold oligotrophic artificial microbial niche for biodiscovery.实验室制冰机作为冷贫营养人工微生物生境的生物发现。

Sci Rep. 2023 Dec 12;13(1):22089. doi: 10.1038/s41598-023-49017-0.

South Africa's indigenous microbial diversity for industrial applications: A review of the current status and opportunities.南非用于工业应用的本土微生物多样性：现状与机遇综述

Heliyon. 2023 Jun 1;9(6):e16723. doi: 10.1016/j.heliyon.2023.e16723. eCollection 2023 Jun.

The potential for plant growth-promoting bacteria to impact crop productivity in future agricultural systems is linked to understanding the principles of microbial ecology.植物促生细菌在未来农业系统中影响作物生产力的潜力与理解微生物生态学原理相关。

Front Microbiol. 2023 May 19;14:1141862. doi: 10.3389/fmicb.2023.1141862. eCollection 2023.

Exploring the Influence of Gut Microbiome on Energy Metabolism in Humans.探索肠道微生物组对人类能量代谢的影响。

Adv Nutr. 2023 Jul;14(4):840-857. doi: 10.1016/j.advnut.2023.03.015. Epub 2023 Apr 7.

Microbiome and Physicochemical Features Associated with Differential Listeria monocytogenes Growth in Soft, Surface-Ripened Cheeses.与软质、表面成熟奶酪中李斯特氏单核细胞增生菌差异生长相关的微生物组和理化特征。

Appl Environ Microbiol. 2023 Apr 26;89(4):e0200422. doi: 10.1128/aem.02004-22. Epub 2023 Mar 28.

GCNSA: DNA storage encoding with a graph convolutional network and self-attention.GCNSA：基于图卷积网络和自注意力机制的DNA存储编码

iScience. 2023 Feb 19;26(3):106231. doi: 10.1016/j.isci.2023.106231. eCollection 2023 Mar 17.

Scientific novelty beyond the experiment.实验之外的科学新颖性。

Microb Biotechnol. 2023 Jun;16(6):1131-1173. doi: 10.1111/1751-7915.14222. Epub 2023 Feb 14.

Functional characterization of prokaryotic dark matter: the road so far and what lies ahead.原核生物暗物质的功能表征：迄今为止的进展与未来展望。

Curr Res Microb Sci. 2022 Aug 7;3:100159. doi: 10.1016/j.crmicr.2022.100159. eCollection 2022.

本文引用的文献

Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin.利用 MetaBin 实现宏基因组序列的快速、准确分类学赋值。

PLoS One. 2012;7(4):e34030. doi: 10.1371/journal.pone.0034030. Epub 2012 Apr 4.

IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth.IDBA-UD：一个用于具有高度不均匀深度的单细胞和宏基因组测序数据的从头组装程序。

Bioinformatics. 2012 Jun 1;28(11):1420-8. doi: 10.1093/bioinformatics/bts174. Epub 2012 Apr 11.

Assessment of metagenomic assembly using simulated next generation sequencing data.基于模拟下一代测序数据的宏基因组组装评估。

PLoS One. 2012;7(2):e31386. doi: 10.1371/journal.pone.0031386. Epub 2012 Feb 23.

Comparative metaproteomics and diversity analysis of human intestinal microbiota testifies for its temporal stability and expression of core functions.比较宏蛋白质组学和人类肠道微生物组多样性分析证明了其时间稳定性和核心功能的表达。

PLoS One. 2012;7(1):e29913. doi: 10.1371/journal.pone.0029913. Epub 2012 Jan 18.

Single cell genome sequencing.单细胞基因组测序。

Curr Opin Biotechnol. 2012 Jun;23(3):437-43. doi: 10.1016/j.copbio.2011.11.018. Epub 2011 Dec 7.

Database resources of the National Center for Biotechnology Information.国家生物技术信息中心数据库资源。

Nucleic Acids Res. 2012 Jan;40(Database issue):D13-25. doi: 10.1093/nar/gkr1184. Epub 2011 Dec 2.

A metaproteomic approach to study human-microbial ecosystems at the mucosal luminal interface.采用宏蛋白质组学方法研究黏膜腔界面的人类微生物生态系统。

PLoS One. 2011;6(11):e26542. doi: 10.1371/journal.pone.0026542. Epub 2011 Nov 21.

The Pfam protein families database.Pfam 蛋白质家族数据库。

Nucleic Acids Res. 2012 Jan;40(Database issue):D290-301. doi: 10.1093/nar/gkr1065. Epub 2011 Nov 29.

Reorganizing the protein space at the Universal Protein Resource (UniProt).重新组织通用蛋白质资源库（UniProt）中的蛋白质空间。

Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5. doi: 10.1093/nar/gkr981. Epub 2011 Nov 18.

eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges.eggNOG v3.0：涵盖了 41 个不同分类范围的 1133 个生物体的直系同源物组。

Nucleic Acids Res. 2012 Jan;40(Database issue):D284-9. doi: 10.1093/nar/gkr1060. Epub 2011 Nov 16.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

宏基因组数据分析的功能分配：挑战与应用。

Functional assignment of metagenomic data: challenges and applications.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献