Suppr超能文献

MPI-PHYLIP:并行计算密集型的系统发育分析程序,用于分析大型蛋白质家族。

MPI-PHYLIP: parallelizing computationally intensive phylogenetic analysis routines for the analysis of large protein families.

机构信息

Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America.

出版信息

PLoS One. 2010 Nov 15;5(11):e13999. doi: 10.1371/journal.pone.0013999.

Abstract

BACKGROUND

Phylogenetic study of protein sequences provides unique and valuable insights into the molecular and genetic basis of important medical and epidemiological problems as well as insights about the origins and development of physiological features in present day organisms. Consensus phylogenies based on the bootstrap and other resampling methods play a crucial part in analyzing the robustness of the trees produced for these analyses.

METHODOLOGY

Our focus was to increase the number of bootstrap replications that can be performed on large protein datasets using the maximum parsimony, distance matrix, and maximum likelihood methods. We have modified the PHYLIP package using MPI to enable large-scale phylogenetic study of protein sequences, using a statistically robust number of bootstrapped datasets, to be performed in a moderate amount of time. This paper discusses the methodology used to parallelize the PHYLIP programs and reports the performance of the parallel PHYLIP programs that are relevant to the study of protein evolution on several protein datasets.

CONCLUSIONS

Calculations that currently take a few days on a state of the art desktop workstation are reduced to calculations that can be performed over lunchtime on a modern parallel computer. Of the three protein methods tested, the maximum likelihood method scales the best, followed by the distance method, and then the maximum parsimony method. However, the maximum likelihood method requires significant memory resources, which limits its application to more moderately sized protein datasets.

摘要

背景

蛋白质序列的系统发育研究为解决重要医学和流行病学问题提供了独特而有价值的分子遗传学基础,也为现今生物生理特征的起源和发展提供了新的认识。基于自举和其他重采样方法的共识系统发育在分析这些分析产生的树的稳健性方面起着至关重要的作用。

方法

我们的重点是增加最大简约法、距离矩阵法和最大似然法在大型蛋白质数据集上进行自举复制的次数。我们使用 MPI 修改了 PHYLIP 包,以便在合理的时间内使用统计上稳健的自举数据集来大规模进行蛋白质序列的系统发育研究。本文讨论了用于并行化 PHYLIP 程序的方法,并报告了并行 PHYLIP 程序在几个蛋白质数据集上进行蛋白质进化研究的性能。

结论

目前在最先进的桌面工作站上需要几天时间的计算,可以在现代并行计算机上的午餐时间内完成。在测试的三种蛋白质方法中,最大似然法的扩展性最好,其次是距离法,然后是最大简约法。然而,最大似然法需要大量的内存资源,这限制了它在更大规模的蛋白质数据集上的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/759f/2981553/0d2a197ae224/pone.0013999.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验