Suppr超能文献

pplacer:将序列线性时间最大似然和贝叶斯系统发生放置到固定参照树上。

pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.

机构信息

Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.

出版信息

BMC Bioinformatics. 2010 Oct 30;11:538. doi: 10.1186/1471-2105-11-538.

Abstract

BACKGROUND

Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets.

RESULTS

This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence.

CONCLUSIONS

Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service.

摘要

背景

基于似然的系统发育推断通常被认为是对未知序列进行分类的最可靠方法。然而,由于计算复杂性问题和缺乏系统发育信号,传统的基于似然的系统发育方法无法应用于来自下一代测序的大量短读长。“系统发育定位”是一种将参考树固定,通过参考比对将未知查询序列放置在树上的方法,它将基于似然的方法的推断能力引入到大数据集。

结果

本文介绍了 pplacer,这是一个用于系统发育定位和后续可视化的软件包。该算法可以在每小时每处理器上将两万条短读长放置在一个有一千个分类单元的参考树上,其参考分类单元数量的时间和内存复杂度基本呈线性,并且易于并行运行。pplacer 具有计算边缘位置的后验概率的功能,这是一种在边缘基础上定量不确定度的统计严格方法。它还可以通过计算放置位置之间的期望距离来告知查询序列的位置不确定性,这对于使用采样良好的参考树估计不确定性至关重要。该软件使用分支厚度和颜色来提供可视化,以表示放置的数量及其不确定性。使用从 631 个 COG 比对生成的读长进行的模拟研究表明,在广泛的比对多样性范围内,系统发育定位具有很高的准确性,并且边缘不确定性估计可以衡量放置的置信度。

结论

pplacer 实现了高效的系统发育定位和后续可视化,使基于似然的系统发育学方法在大量读长集合中变得可行;它以源代码、二进制文件和网络服务的形式免费提供。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c671/3098090/41b7cfd4aecd/1471-2105-11-538-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验