MPI-PHYLIP：并行计算密集型的系统发育分析程序，用于分析大型蛋白质家族。

MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.

机构信息

Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.

出版信息

PLoS One. 2010 Nov 15;5(11):e13999. doi: 10.1371/journal.pone.0013999.

DOI:10.1371/journal.pone.0013999

PMID:21085574

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2981553/

Abstract

BACKGROUND

Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses.

METHODOLOGY

Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets.

CONCLUSIONS

Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to more moderately sized protein datasets.

摘要

背景

蛋白质序列的系统发育研究为解决重要医学和流行病学问题提供了独特而有价值的分子遗传学基础，也为现今生物生理特征的起源和发展提供了新的认识。基于自举和其他重采样方法的共识系统发育在分析这些分析产生的树的稳健性方面起着至关重要的作用。

方法

我们的重点是增加最大简约法、距离矩阵法和最大似然法在大型蛋白质数据集上进行自举复制的次数。我们使用 MPI 修改了 PHYLIP 包，以便在合理的时间内使用统计上稳健的自举数据集来大规模进行蛋白质序列的系统发育研究。本文讨论了用于并行化 PHYLIP 程序的方法，并报告了并行 PHYLIP 程序在几个蛋白质数据集上进行蛋白质进化研究的性能。

结论

目前在最先进的桌面工作站上需要几天时间的计算，可以在现代并行计算机上的午餐时间内完成。在测试的三种蛋白质方法中，最大似然法的扩展性最好，其次是距离法，然后是最大简约法。然而，最大似然法需要大量的内存资源，这限制了它在更大规模的蛋白质数据集上的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/759f/2981553/0d2a197ae224/pone.0013999.g001.jpg

相似文献

MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.MPI-PHYLIP：并行计算密集型的系统发育分析程序，用于分析大型蛋白质家族。

PLoS One. 2010 Nov 15;5(11):e13999. doi: 10.1371/journal.pone.0013999.

On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?利用ESTs进行系统发育基因组学研究：能否从有缺口的比对中准确推断系统发育树？

BMC Evol Biol. 2008 Mar 26;8:95. doi: 10.1186/1471-2148-8-95.

PhyloBench: A Benchmark for Evaluating Phylogenetic Programs.PhyloBench：评估系统发育程序的基准

Mol Biol Evol. 2024 Jun 1;41(6). doi: 10.1093/molbev/msae084.

New approaches to phylogenetic tree search and their application to large numbers of protein alignments.系统发育树搜索的新方法及其在大量蛋白质序列比对中的应用。

Syst Biol. 2007 Oct;56(5):727-40. doi: 10.1080/10635150701611134.

Porting PHYLIP phylogenetic package on the desktop GRID platform XtremWeb-CH.在桌面网格平台XtremWeb-CH上移植PHYLIP系统发育软件包。

Stud Health Technol Inform. 2007;126:55-64.

Genetic algorithm for large-scale maximum parsimony phylogenetic analysis of proteins.用于蛋白质大规模最大简约系统发育分析的遗传算法。

Biochim Biophys Acta. 2005 Aug 30;1725(1):19-29. doi: 10.1016/j.bbagen.2005.04.027.

pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.pplacer：将序列线性时间最大似然和贝叶斯系统发生放置到固定参照树上。

BMC Bioinformatics. 2010 Oct 30;11:538. doi: 10.1186/1471-2105-11-538.

DPRml: distributed phylogeny reconstruction by maximum likelihood.DPRml：基于最大似然法的分布式系统发育重建

Bioinformatics. 2005 Apr 1;21(7):969-74. doi: 10.1093/bioinformatics/bti100. Epub 2004 Oct 28.

PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination.PICS-Ord：通过成对身份和代价评分排序对模糊区域进行无限制编码。

BMC Bioinformatics. 2011 Jan 7;12:10. doi: 10.1186/1471-2105-12-10.

引用本文的文献

Bioinformatic Characterization and Molecular Evolution of the Hemoglobins.血红蛋白的生物信息学特征分析与分子进化

Genes (Basel). 2022 Nov 5;13(11):2041. doi: 10.3390/genes13112041.

Bioinformatics tools used for whole-genome sequencing analysis of Neisseria gonorrhoeae: a literature review.用于淋病奈瑟菌全基因组测序分析的生物信息学工具：文献综述。

Brief Funct Genomics. 2022 Apr 11;21(2):78-89. doi: 10.1093/bfgp/elab028.

In silico analysis of class I adenylate-forming enzymes reveals family and group-specific conservations.计算机分析 I 类腺苷酸形成酶揭示了家族和组特异性的保守性。

PLoS One. 2018 Sep 4;13(9):e0203218. doi: 10.1371/journal.pone.0203218. eCollection 2018.

analysis of heme oxygenase structural homologues identifies group-specific conservations.对血红素加氧酶结构同源物的分析确定了特定组的保守性。

FEBS Open Bio. 2017 Sep 4;7(10):1480-1498. doi: 10.1002/2211-5463.12275. eCollection 2017 Oct.

Evolution of the Twist Subfamily Vertebrate Proteins: Discovery of a Signature Motif and Origin of the Twist1 Glycine-Rich Motifs in the Amino-Terminus Disordered Domain.Twist亚家族脊椎动物蛋白的进化：一个特征基序的发现以及Twist1富含甘氨酸基序在氨基末端无序结构域的起源

PLoS One. 2016 Aug 24;11(8):e0161029. doi: 10.1371/journal.pone.0161029. eCollection 2016.

Analysis of nucleotide diphosphate sugar dehydrogenases reveals family and group-specific relationships.核苷酸二磷酸糖脱氢酶分析揭示了家族和组特异性关系。

FEBS Open Bio. 2016 Jan 11;6(1):77-89. doi: 10.1002/2211-5463.12022. eCollection 2016 Jan.

Horizontally transferred genes in the genome of Pacific white shrimp, Litopenaeus vannamei.太平洋白对虾基因组中的水平转移基因。

BMC Evol Biol. 2013 Aug 6;13:165. doi: 10.1186/1471-2148-13-165.

Classifying the topology of AHL-driven quorum sensing circuits in proteobacterial genomes.在原核生物基因组中对 AHL 驱动的群体感应电路的拓扑结构进行分类。

Sensors (Basel). 2012;12(5):5432-44. doi: 10.3390/s120505432. Epub 2012 Apr 27.

本文引用的文献

The Mineralocorticoid Receptor-How to Get Away with Promiscuity: Evolution of Hormone-Receptor Complexity by Molecular Exploitation. Science 312: 97-101, 2006.盐皮质激素受体——如何在滥交中全身而退：通过分子利用实现激素受体复杂性的进化。《科学》312卷：97 - 101页，2006年。

J Am Soc Nephrol. 2006 Jul;17(7):1759-1764. doi: 10.1681/01.asn.0000926836.46869.e5.

Phylogenomic analysis of the cystatin superfamily in eukaryotes and prokaryotes.真核生物和原核生物中胱抑素超家族的系统基因组学分析。

BMC Evol Biol. 2009 Nov 18;9:266. doi: 10.1186/1471-2148-9-266.

A rapid bootstrap algorithm for the RAxML Web servers.一种用于RAxML网络服务器的快速自引导算法。

Syst Biol. 2008 Oct;57(5):758-71. doi: 10.1080/10635150802429642.

INTREPID--INformation-theoretic TREe traversal for Protein functional site IDentification.INTREPID——用于蛋白质功能位点识别的信息论树遍历法

Bioinformatics. 2008 Nov 1;24(21):2445-52. doi: 10.1093/bioinformatics/btn474. Epub 2008 Sep 6.

Dynamics of genome rearrangement in bacterial populations.细菌群体中基因组重排的动力学

PLoS Genet. 2008 Jul 18;4(7):e1000128. doi: 10.1371/journal.pgen.1000128.

Determinants of protein function revealed by combinatorial entropy optimization.通过组合熵优化揭示蛋白质功能的决定因素。

Genome Biol. 2007;8(11):R232. doi: 10.1186/gb-2007-8-11-r232.

Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.从蛋白质序列比对中去除分歧和比对不明确的区域后系统发育树的改进。

Syst Biol. 2007 Aug;56(4):564-77. doi: 10.1080/10635150701472164.

MultiPhyl: a high-throughput phylogenomics webserver using distributed computing.MultiPhyl：一个使用分布式计算的高通量系统发育基因组学网络服务器。

Nucleic Acids Res. 2007 Jul;35(Web Server issue):W33-7. doi: 10.1093/nar/gkm359. Epub 2007 Jun 6.

MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0.MEGA4：分子进化遗传学分析（MEGA）软件版本4.0。

Mol Biol Evol. 2007 Aug;24(8):1596-9. doi: 10.1093/molbev/msm092. Epub 2007 May 7.

RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.RAxML-VI-HPC：基于最大似然法的系统发育分析，适用于数千个分类单元及混合模型。

Bioinformatics. 2006 Nov 1;22(21):2688-90. doi: 10.1093/bioinformatics/btl446. Epub 2006 Aug 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

MPI-PHYLIP：并行计算密集型的系统发育分析程序，用于分析大型蛋白质家族。

MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.

机构信息

出版信息

BACKGROUND

METHODOLOGY

CONCLUSIONS

背景

方法

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献