Baltoumas Fotis A, Karatzas Evangelos, Paez-Espino David, Venetsianou Nefeli K, Aplakidou Eleni, Oulas Anastasis, Finn Robert D, Ovchinnikov Sergey, Pafilis Evangelos, Kyrpides Nikos C, Pavlopoulos Georgios A
Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece.
Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States.
Front Bioinform. 2023 Mar 3;3:1157956. doi: 10.3389/fbinf.2023.1157956. eCollection 2023.
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
宏基因组学使人们能够获取自然微生物群落的基因库。宏基因组鸟枪法测序已成为研究和分类来自各种环境的微生物的首选方法。为此,已经开发了几种方法来处理和分析从原始读数到最终产物(如预测的蛋白质序列或家族)的序列数据。在本文中,我们进行了全面的综述,以简化此类过程,并讨论为了在蛋白质家族水平上探索生物多样性可采用的替代方法。我们提供了分析工具的详细信息,并对它们的可扩展性以及优缺点进行了评论。最后,我们报告了可用的数据存储库,并推荐了与系统发育分布、结构预测和元数据富集相关的蛋白质家族注释的各种方法。