自动引用树和多层次系统发育定位方法。

Methods for automatic reference trees and multilevel phylogenetic placement.

机构信息

Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

出版信息

Bioinformatics. 2019 Apr 1;35(7):1151-1158. doi: 10.1093/bioinformatics/bty767.

DOI:10.1093/bioinformatics/bty767

PMID:30169747

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6449752/

Abstract

MOTIVATION

In most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phylogeny. These placement methods do however face certain limitations: The manual selection of reference sequences is labor-intensive; the computational effort to infer reference phylogenies is substantially larger than for methods that rely on sequence similarity; the number of taxa in the reference phylogeny should be small enough to allow for visually inspecting the results.

RESULTS

We present algorithms to overcome the above limitations. First, we introduce a method to automatically construct representative sequences from databases to infer reference phylogenies. Second, we present an approach for conducting large-scale phylogenetic placements on nested phylogenies. Third, we describe a preprocessing pipeline that allows for handling huge sequence datasets. Our experiments on empirical data show that our methods substantially accelerate the workflow and yield highly accurate placement results.

AVAILABILITY AND IMPLEMENTATION

Freely available under GPLv3 at http://github.com/lczech/gappa.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在大多数宏基因组测序研究中，初始分析步骤包括评估序列的进化来源。系统发生（或进化）定位方法可用于确定序列相对于给定参考系统发育的进化位置。然而，这些定位方法确实存在某些限制：参考序列的手动选择非常繁琐；推断参考系统发育的计算工作量比依赖序列相似性的方法大得多；参考系统发育中的分类单元数量应足够小，以便能够直观地检查结果。

结果

我们提出了克服上述限制的算法。首先，我们引入了一种从数据库中自动构建代表序列以推断参考系统发育的方法。其次，我们提出了一种在嵌套系统发育上进行大规模系统发生定位的方法。第三，我们描述了一个预处理管道，允许处理巨大的序列数据集。我们在经验数据上的实验表明，我们的方法大大加快了工作流程，并产生了高度准确的定位结果。

可用性和实现

可在 http://github.com/lczech/gappa 上根据 GPLv3 免费获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9ed/6449752/555c0893d2ae/bty767f1.jpg

相似文献

Methods for automatic reference trees and multilevel phylogenetic placement.自动引用树和多层次系统发育定位方法。

Bioinformatics. 2019 Apr 1;35(7):1151-1158. doi: 10.1093/bioinformatics/bty767.

Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data.Genesis 和 Gappa：处理、分析和可视化系统发育（位置）数据。

Bioinformatics. 2020 May 1;36(10):3263-3265. doi: 10.1093/bioinformatics/btaa070.

Rapid alignment-free phylogenetic identification of metagenomic sequences.基于快速比对的宏基因组序列系统发育鉴定

Bioinformatics. 2019 Sep 15;35(18):3303-3312. doi: 10.1093/bioinformatics/btz068.

SEPP: SATé-enabled phylogenetic placement.SEPP：基于SATé的系统发育定位

Pac Symp Biocomput. 2012:247-58. doi: 10.1142/9789814366496_0024.

On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Scalable methods for analyzing and visualizing phylogenetic placement of metagenomic samples.可扩展的方法来分析和可视化宏基因组样本的系统发育定位。

PLoS One. 2019 May 28;14(5):e0217050. doi: 10.1371/journal.pone.0217050. eCollection 2019.

An emerging phylogenetic core of Archaea: phylogenies of transcription and translation machineries converge following addition of new genome sequences.古菌一个新出现的系统发育核心：随着新基因组序列的增加，转录和翻译机制的系统发育趋同。

BMC Evol Biol. 2005 Jun 2;5:36. doi: 10.1186/1471-2148-5-36.

PhyloGeoTool: interactively exploring large phylogenies in an epidemiological context.PhyloGeoTool：在流行病学背景下交互式探索大型系统发育。

Bioinformatics. 2017 Dec 15;33(24):3993-3995. doi: 10.1093/bioinformatics/btx535.

COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.认知器：宏基因组数据集功能注释框架

PLoS One. 2015 Nov 11;10(11):e0142102. doi: 10.1371/journal.pone.0142102. eCollection 2015.

PUmPER: phylogenies updated perpetually.PUmPER：不断更新的系统发育。

Bioinformatics. 2014 May 15;30(10):1476-7. doi: 10.1093/bioinformatics/btu053. Epub 2014 Jan 28.

引用本文的文献

CRISPR/Cas9 Knockout of Shell Matrix Protein 1 in the Slipper-Snail Crepidula atrasolea.在拖鞋蜗牛（Crepidula atrasolea）中利用CRISPR/Cas9敲除壳基质蛋白1

J Exp Zool B Mol Dev Evol. 2025 Jul;344(5):266-283. doi: 10.1002/jez.b.23293. Epub 2025 May 4.

Scalable method for exploring phylogenetic placement uncertainty with custom visualizations using and .使用[具体工具1]和[具体工具2]通过自定义可视化探索系统发育位置不确定性的可扩展方法。

Imeta. 2025 Jan 12;4(1):e269. doi: 10.1002/imt2.269. eCollection 2025 Feb.

Genetic determination of regional connectivity in modelling the spread of COVID-19 outbreak for more efficient mitigation strategies.基于遗传因素确定区域连通性，以建模 COVID-19 疫情传播，制定更有效的缓解策略。

Sci Rep. 2023 May 25;13(1):8470. doi: 10.1038/s41598-023-34959-2.

App-SpaM: phylogenetic placement of short reads without sequence alignment.App-SpaM：无需序列比对的短读段系统发育定位

Bioinform Adv. 2021 Oct 13;1(1):vbab027. doi: 10.1093/bioadv/vbab027. eCollection 2021.

Metagenomic Analysis Using Phylogenetic Placement-A Review of the First Decade.基于系统发育定位的宏基因组分析——首个十年综述

Front Bioinform. 2022 May 26;2:871393. doi: 10.3389/fbinf.2022.871393. eCollection 2022.

African mitochondrial haplogroup L7: a 100,000-year-old maternal human lineage discovered through reassessment and new sequencing.非洲线粒体单倍群 L7：通过重新评估和新测序发现的一个 10 万年历史的母系人类谱系。

Sci Rep. 2022 Jun 24;12(1):10747. doi: 10.1038/s41598-022-13856-0.

Comparative Analysis of Metagenomics and Metataxonomics for the Characterization of Vermicompost Microbiomes.宏基因组学和宏分类学用于表征蚯蚓堆肥微生物群落的比较分析

Front Microbiol. 2022 May 10;13:854423. doi: 10.3389/fmicb.2022.854423. eCollection 2022.

SHOOT: phylogenetic gene search and ortholog inference.SHOOT：系统发育基因搜索和直系同源推断。

Genome Biol. 2022 Mar 28;23(1):85. doi: 10.1186/s13059-022-02652-8.

Type II Photosynthetic Reaction Center Genes of Avocado (Persea americana Mill.) Bark Microbial Communities are Dominated by Aerobic Anoxygenic Alphaproteobacteria.鳄梨（Persea americana Mill.）树皮微生物群落的 II 型光合作用反应中心基因主要由好氧厌氧 α-变形菌纲（Aerobic Anoxygenic Alphaproteobacteria）组成。

Curr Microbiol. 2021 Jul;78(7):2623-2630. doi: 10.1007/s00284-021-02525-6. Epub 2021 May 15.

Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult.对 SARS-CoV-2 数据进行系统发育分析很困难。

Mol Biol Evol. 2021 May 4;38(5):1777-1791. doi: 10.1093/molbev/msaa314.

本文引用的文献

EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences.EPA-ng：大规模并行遗传序列布局进化。

Syst Biol. 2019 Mar 1;68(2):365-369. doi: 10.1093/sysbio/syy054.

A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life.基于基因组系统发育的标准化细菌分类学极大地改变了生命之树。

Nat Biotechnol. 2018 Nov;36(10):996-1004. doi: 10.1038/nbt.4229. Epub 2018 Aug 27.

Critical Assessment of Metagenome Interpretation Enters the Second Round.宏基因组解读的批判性评估进入第二轮。

mSystems. 2018 Jul 10;3(4). doi: 10.1128/mSystems.00103-18. eCollection 2018 Jul-Aug.

A communal catalogue reveals Earth's multiscale microbial diversity.一份公共目录揭示了地球的多尺度微生物多样性。

Nature. 2017 Nov 23;551(7681):457-463. doi: 10.1038/nature24621. Epub 2017 Nov 1.

Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software.宏基因组解读的批判性评估——宏基因组学软件的一项基准测试

Nat Methods. 2017 Nov;14(11):1063-1071. doi: 10.1038/nmeth.4458. Epub 2017 Oct 2.

Parasites dominate hyperdiverse soil protist communities in Neotropical rainforests.寄生虫在新热带雨林中高度多样化的土壤原生生物群落中占主导地位。

Nat Ecol Evol. 2017 Mar 20;1(4):91. doi: 10.1038/s41559-017-0091.

SILVA, RDP, Greengenes, NCBI and OTT - how do these taxonomies compare?SILVA、RDP、Greengenes、NCBI和OTT——这些分类法如何比较？

BMC Genomics. 2017 Mar 14;18(Suppl 2):114. doi: 10.1186/s12864-017-3501-4.

Phylogeny-aware identification and correction of taxonomically mislabeled sequences.基于系统发育的分类错误标记序列的识别与校正

Nucleic Acids Res. 2016 Jun 20;44(11):5022-33. doi: 10.1093/nar/gkw396. Epub 2016 May 10.

The Road to Metagenomics: From Microbiology to DNA Sequencing Technologies and Bioinformatics.宏基因组学之路：从微生物学到DNA测序技术与生物信息学

Front Genet. 2015 Dec 17;6:348. doi: 10.3389/fgene.2015.00348. eCollection 2015.

Ocean plankton. Eukaryotic plankton diversity in the sunlit ocean.海洋浮游生物。阳光照耀下的海洋中的真核浮游生物多样性。

Science. 2015 May 22;348(6237):1261605. doi: 10.1126/science.1261605.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

自动引用树和多层次系统发育定位方法。

Methods for automatic reference trees and multilevel phylogenetic placement.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献