基于最小进化原理对宏基因组 reads 进行系统发育定位。

Phylogenetic placement of metagenomic reads using the minimum evolution principle.

作者信息

Filipski Alan, Tamura Koichiro, Billing-Ross Paul, Murillo Oscar, Kumar Sudhir

出版信息

BMC Genomics. 2015;16 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-16-S1-S13. Epub 2015 Jan 15.

DOI:10.1186/1471-2164-16-S1-S13

PMID:25923672

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4315155/

Abstract

BACKGROUND

A central problem of computational metagenomics is determining the correct placement into an existing phylogenetic tree of individual reads (nucleotide sequences of varying lengths, ranging from hundreds to thousands of bases) obtained using next-generation sequencing of DNA samples from a mixture of known and unknown species. Correct placement allows us to easily identify or classify the sequences in the sample as to taxonomic position or function.

RESULTS

Here we propose a novel method (PhyClass), based on the Minimum Evolution (ME) phylogenetic inference criterion, for determining the appropriate phylogenetic position of each read. Without using heuristics, the new approach efficiently finds the optimal placement of the unknown read in a reference phylogenetic tree given a sequence alignment for the taxa in the tree. In short, the total resulting branch length for the tree is computed for every possible placement of the unknown read and the placement that gives the smallest value for this total is the best (optimal) choice. By taking advantage of computational efficiencies and mathematical formulations, we are able to find the true optimal ME placement for each read in the phylogenetic tree. Using computer simulations, we assessed the accuracy of the new approach for different read lengths over a variety of data sets and phylogenetic trees. We found the accuracy of the new method to be good and comparable to existing Maximum Likelihood (ML) approaches.

CONCLUSIONS

In particular, we found that the consensus assignments based on ME and ML approaches are more correct than either method individually. This is true even when the statistical support for read assignments was low, which is inevitable given that individual reads are often short and come from only one gene.

摘要

背景

计算宏基因组学的一个核心问题是，对于通过对来自已知和未知物种混合物的DNA样本进行下一代测序获得的单个读段（长度从数百到数千个碱基不等的核苷酸序列），确定其在现有系统发育树中的正确位置。正确的位置确定使我们能够轻松地根据分类位置或功能对样本中的序列进行识别或分类。

结果

在此，我们提出了一种基于最小进化（ME）系统发育推断标准的新方法（PhyClass），用于确定每个读段的合适系统发育位置。在不使用启发式方法的情况下，给定树中分类群的序列比对，新方法能有效地在参考系统发育树中找到未知读段的最佳位置。简而言之，对于未知读段的每一种可能位置，计算树的总分支长度，使该总和值最小的位置就是最佳（最优）选择。通过利用计算效率和数学公式，我们能够在系统发育树中找到每个读段的真正最优ME位置。使用计算机模拟，我们在各种数据集和系统发育树上评估了新方法对于不同读段长度的准确性。我们发现新方法具有良好的准确性，与现有的最大似然（ML）方法相当。

结论

特别是，我们发现基于ME和ML方法的一致性分配比单独使用任何一种方法都更准确。即使在读段分配的统计支持较低时也是如此，鉴于单个读段通常较短且仅来自一个基因，这种情况是不可避免的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b9e/4315155/89c33573f305/1471-2164-16-S1-S13-1.jpg

相似文献

Phylogenetic placement of metagenomic reads using the minimum evolution principle.基于最小进化原理对宏基因组 reads 进行系统发育定位。

BMC Genomics. 2015;16 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-16-S1-S13. Epub 2015 Jan 15.

SEPP: SATé-enabled phylogenetic placement.SEPP：基于SATé的系统发育定位

Pac Symp Biocomput. 2012:247-58. doi: 10.1142/9789814366496_0024.

LSHPlace: fast phylogenetic placement using locality-sensitive hashing.LSHPlace：使用局部敏感哈希进行快速系统发育定位

Pac Symp Biocomput. 2013:310-9.

pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.pplacer：将序列线性时间最大似然和贝叶斯系统发生放置到固定参照树上。

BMC Bioinformatics. 2010 Oct 30;11:538. doi: 10.1186/1471-2105-11-538.

Beyond classification: gene-family phylogenies from shotgun metagenomic reads enable accurate community analysis.超越分类：来自鸟枪法宏基因组读取的基因家族系统发育树可实现精确的群落分析。

BMC Genomics. 2013 Jun 22;14:419. doi: 10.1186/1471-2164-14-419.

Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood.基于最大似然法的短序列读取进化定位的性能、准确性和网络服务器。

Syst Biol. 2011 May;60(3):291-302. doi: 10.1093/sysbio/syr010. Epub 2011 Mar 23.

Aligning short reads to reference alignments and trees.将短读段比对到参考比对和树。

Bioinformatics. 2011 Aug 1;27(15):2068-75. doi: 10.1093/bioinformatics/btr320. Epub 2011 Jun 2.

On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Efficiencies of fast algorithms of phylogenetic inference under the criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large number of sequences are used.在使用大量序列时，基于最大简约法、最小进化法和最大似然法标准的系统发育推断快速算法的效率。

Mol Biol Evol. 2000 Aug;17(8):1251-8. doi: 10.1093/oxfordjournals.molbev.a026408.

A rapid heuristic algorithm for finding minimum evolution trees.一种用于寻找最小进化树的快速启发式算法。

Mol Phylogenet Evol. 2000 Aug;16(2):173-9. doi: 10.1006/mpev.1999.0728.

引用本文的文献

Robust expansion of phylogeny for fast-growing genome sequence data.快速增长的基因组序列数据的系统发育稳健扩展。

PLoS Comput Biol. 2024 Feb 8;20(2):e1011871. doi: 10.1371/journal.pcbi.1011871. eCollection 2024 Feb.

Alteromonas salexigens sp. nov., isolated from coastal seawater.盐生交替单胞菌新种，从沿海水域中分离得到。

Arch Microbiol. 2023 Aug 23;205(9):317. doi: 10.1007/s00203-023-03658-x.

TreeSwift: A massively scalable Python tree package.TreeSwift：一个大规模可扩展的Python树包。

SoftwareX. 2020 Jan-Jun;11. doi: 10.1016/j.softx.2020.100436. Epub 2020 Mar 4.

Risk Factors and Outcome of Sepsis in Traumatic Patients and Pathogen Detection Using Metagenomic Next-Generation Sequencing.创伤患者脓毒症的危险因素、结局及基于宏基因组下一代测序的病原体检测

Can J Infect Dis Med Microbiol. 2022 Apr 25;2022:2549413. doi: 10.1155/2022/2549413. eCollection 2022.

Genetic characterization of a Marek's disease virus strain isolated in Japan.日本分离的马立克氏病病毒株的遗传特征。

Virol J. 2020 Nov 23;17(1):186. doi: 10.1186/s12985-020-01456-1.

SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.SWPhylo——一种通过比较寡核苷酸模式以及整合基于基因组和基于基因的系统发育树进行系统发育基因组推断的新型工具。

Evol Bioinform Online. 2018 Feb 20;14:1176934318759299. doi: 10.1177/1176934318759299. eCollection 2018.

Construction of a Species-Level Tree of Life for the Insects and Utility in Taxonomic Profiling.昆虫物种水平生命树的构建及其在分类学分析中的应用

Syst Biol. 2017 May 1;66(3):426-439. doi: 10.1093/sysbio/syw099.

The Third International Genomic Medicine Conference (3rd IGMC, 2015): overall activities and outcome highlights.第三届国际基因组医学会议（2015年第三届IGMC）：总体活动及成果亮点

BMC Genomics. 2016 Oct 17;17(Suppl 9):747. doi: 10.1186/s12864-016-3085-4.

本文引用的文献

The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. SILVA 核糖体 RNA 基因数据库项目：改进的数据处理和基于网络的工具。

Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. doi: 10.1093/nar/gks1219. Epub 2012 Nov 28.

MEGA-CC: computing core of molecular evolutionary genetics analysis program for automated and iterative data analysis.MEGA-CC：分子进化遗传学分析程序的计算核心，用于自动化和迭代数据分析。

Bioinformatics. 2012 Oct 15;28(20):2685-6. doi: 10.1093/bioinformatics/bts507. Epub 2012 Aug 24.

Accelerated Profile HMM Searches.加速轮廓隐马尔可夫模型搜索。

PLoS Comput Biol. 2011 Oct;7(10):e1002195. doi: 10.1371/journal.pcbi.1002195. Epub 2011 Oct 20.

Statistics and truth in phylogenomics.系统发生基因组学中的统计学与真理。

Mol Biol Evol. 2012 Feb;29(2):457-72. doi: 10.1093/molbev/msr202. Epub 2011 Aug 26.

Efficient alignment of pyrosequencing reads for re-sequencing applications.用于重测序应用的焦磷酸测序reads 的高效比对。

BMC Bioinformatics. 2011 May 16;12:163. doi: 10.1186/1471-2105-12-163.

MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.MEGA5：用于最大似然法、进化距离法和最大简约法的分子进化遗传学分析。

Mol Biol Evol. 2011 Oct;28(10):2731-9. doi: 10.1093/molbev/msr121. Epub 2011 May 4.

Syst Biol. 2011 May;60(3):291-302. doi: 10.1093/sysbio/syr010. Epub 2011 Mar 23.

Metagenomic analysis of kimchi, a traditional Korean fermented food.泡菜的宏基因组分析，一种传统的韩国发酵食品。

Appl Environ Microbiol. 2011 Apr;77(7):2264-74. doi: 10.1128/AEM.02157-10. Epub 2011 Feb 11.

Flexible taxonomic assignment of ambiguous sequencing reads.灵活的分类学分配模糊测序读段。

BMC Bioinformatics. 2011 Jan 7;12:8. doi: 10.1186/1471-2105-12-8.

BMC Bioinformatics. 2010 Oct 30;11:538. doi: 10.1186/1471-2105-11-538.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于最小进化原理对宏基因组 reads 进行系统发育定位。

Phylogenetic placement of metagenomic reads using the minimum evolution principle.

作者信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献