Suppr超能文献

自动引用树和多层次系统发育定位方法。

Methods for automatic reference trees and multilevel phylogenetic placement.

机构信息

Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

出版信息

Bioinformatics. 2019 Apr 1;35(7):1151-1158. doi: 10.1093/bioinformatics/bty767.

Abstract

MOTIVATION

In most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phylogeny. These placement methods do however face certain limitations: The manual selection of reference sequences is labor-intensive; the computational effort to infer reference phylogenies is substantially larger than for methods that rely on sequence similarity; the number of taxa in the reference phylogeny should be small enough to allow for visually inspecting the results.

RESULTS

We present algorithms to overcome the above limitations. First, we introduce a method to automatically construct representative sequences from databases to infer reference phylogenies. Second, we present an approach for conducting large-scale phylogenetic placements on nested phylogenies. Third, we describe a preprocessing pipeline that allows for handling huge sequence datasets. Our experiments on empirical data show that our methods substantially accelerate the workflow and yield highly accurate placement results.

AVAILABILITY AND IMPLEMENTATION

Freely available under GPLv3 at http://github.com/lczech/gappa.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在大多数宏基因组测序研究中,初始分析步骤包括评估序列的进化来源。系统发生(或进化)定位方法可用于确定序列相对于给定参考系统发育的进化位置。然而,这些定位方法确实存在某些限制:参考序列的手动选择非常繁琐;推断参考系统发育的计算工作量比依赖序列相似性的方法大得多;参考系统发育中的分类单元数量应足够小,以便能够直观地检查结果。

结果

我们提出了克服上述限制的算法。首先,我们引入了一种从数据库中自动构建代表序列以推断参考系统发育的方法。其次,我们提出了一种在嵌套系统发育上进行大规模系统发生定位的方法。第三,我们描述了一个预处理管道,允许处理巨大的序列数据集。我们在经验数据上的实验表明,我们的方法大大加快了工作流程,并产生了高度准确的定位结果。

可用性和实现

可在 http://github.com/lczech/gappa 上根据 GPLv3 免费获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e9ed/6449752/555c0893d2ae/bty767f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验