FastMG：一种简单、快速且准确的最大似然程序，用于从大型数据集中估计氨基酸替换率矩阵。

FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets.

作者信息

Dang Cuong Cao, Le Vinh Sy, Gascuel Olivier, Hazes Bart, Le Quang Si

机构信息

The Wellcome Trust Center for Human Genetics, Oxford University, Oxford, UK.

出版信息

BMC Bioinformatics. 2014 Oct 24;15(1):341. doi: 10.1186/1471-2105-15-341.

DOI:10.1186/1471-2105-15-341

PMID:25344302

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4287512/

Abstract

BACKGROUND

Amino acid replacement rate matrices are a crucial component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Ideally, the rate matrix reflects the mutational behavior of the actual data under study; however, estimating amino acid replacement rate matrices requires large protein alignments and is computationally expensive and complex. As a compromise, sub-optimal pre-calculated generic matrices are typically used for protein-based phylogeny. Sequence availability has now grown to a point where problem-specific rate matrices can often be calculated if the computational cost can be controlled.

RESULTS

The most time consuming step in estimating rate matrices by maximum likelihood is building maximum likelihood phylogenetic trees from protein alignments. We propose a new procedure, called FastMG, to overcome this obstacle. The key innovation is the alignment-splitting algorithm that splits alignments with many sequences into non-overlapping sub-alignments prior to estimating amino acid replacement rates. Experiments with different large data sets showed that the FastMG procedure was an order of magnitude faster than without splitting. Importantly, there was no apparent loss in matrix quality if an appropriate splitting procedure is used.

CONCLUSIONS

FastMG is a simple, fast and accurate procedure to estimate amino acid replacement rate matrices from large data sets. It enables researchers to study the evolutionary relationships for specific groups of proteins or taxa with optimized, data-specific amino acid replacement rate matrices. The programs, data sets, and the new mammalian mitochondrial protein rate matrix are available at http://fastmg.codeplex.com.

摘要

背景

氨基酸替换率矩阵是许多蛋白质分析系统（如序列相似性搜索、序列比对和系统发育推断）的关键组成部分。理想情况下，该速率矩阵反映了所研究实际数据的突变行为；然而，估计氨基酸替换率矩阵需要大量的蛋白质比对，且计算成本高昂且复杂。作为一种折衷方案，次优的预先计算的通用矩阵通常用于基于蛋白质的系统发育分析。随着序列可用性的不断提高，如果能够控制计算成本，现在通常可以计算特定问题的速率矩阵。

结果

通过最大似然法估计速率矩阵时，最耗时的步骤是从蛋白质比对构建最大似然系统发育树。我们提出了一种名为FastMG的新方法来克服这一障碍。关键创新在于比对拆分算法，该算法在估计氨基酸替换率之前，将包含多个序列的比对拆分为不重叠的子比对。对不同大数据集的实验表明，FastMG方法比不拆分时快一个数量级。重要的是，如果使用适当的拆分程序，矩阵质量不会有明显损失。

结论

FastMG是一种从大数据集中估计氨基酸替换率矩阵的简单、快速且准确的方法。它使研究人员能够使用优化的、特定数据的氨基酸替换率矩阵来研究特定蛋白质组或分类群的进化关系。相关程序、数据集以及新哺乳动物线粒体蛋白质速率矩阵可在http://fastmg.codeplex.com获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8626/4287512/b4c11b17e0bf/12859_2014_6772_Fig1_HTML.jpg

相似文献

FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets.FastMG：一种简单、快速且准确的最大似然程序，用于从大型数据集中估计氨基酸替换率矩阵。

BMC Bioinformatics. 2014 Oct 24;15(1):341. doi: 10.1186/1471-2105-15-341.

Data-specific substitution models improve protein-based phylogenetics.基于数据的替代模型可提高基于蛋白质的系统发育分析。

PeerJ. 2023 Aug 8;11:e15716. doi: 10.7717/peerj.15716. eCollection 2023.

QMaker: Fast and Accurate Method to Estimate Empirical Models of Protein Evolution.QMaker：一种快速准确的蛋白质进化经验模型估计方法。

Syst Biol. 2021 Aug 11;70(5):1046-1060. doi: 10.1093/sysbio/syab010.

On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Modeling protein evolution with several amino acid replacement matrices depending on site rates.基于位点速率的几种氨基酸替换矩阵来模拟蛋白质进化。

Mol Biol Evol. 2012 Oct;29(10):2921-36. doi: 10.1093/molbev/mss112. Epub 2012 Apr 6.

Scoredist: a simple and robust protein sequence distance estimator.Scoredist：一种简单且强大的蛋白质序列距离估计器。

BMC Bioinformatics. 2005 Apr 27;6:108. doi: 10.1186/1471-2105-6-108.

Phylogenetic mixture models for proteins.蛋白质的系统发育混合模型

Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3965-76. doi: 10.1098/rstb.2008.0180.

ReplacementMatrix: a web server for maximum-likelihood estimation of amino acid replacement rate matrices.ReplacementMatrix：一个用于最大似然估计氨基酸替换率矩阵的网络服务器。

Bioinformatics. 2011 Oct 1;27(19):2758-60. doi: 10.1093/bioinformatics/btr435. Epub 2011 Jul 26.

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II：一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。

Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

An amino acid substitution-selection model adjusts residue fitness to improve phylogenetic estimation.氨基酸替换选择模型调整残基适合度以改进系统发育估计。

Mol Biol Evol. 2014 Apr;31(4):779-92. doi: 10.1093/molbev/msu044. Epub 2014 Jan 16.

引用本文的文献

Developing and Applying RNA Empirical Models With Secondary Structure Insights for Orthoptera Phylogenetics.基于二级结构见解开发并应用RNA实证模型用于直翅目系统发育研究

Ecol Evol. 2025 Aug 31;15(9):e72068. doi: 10.1002/ece3.72068. eCollection 2025 Sep.

Ultrafast classical phylogenetic method beats large protein language models on variant effect prediction.超快经典系统发育方法在变异效应预测方面胜过大型蛋白质语言模型。

Adv Neural Inf Process Syst. 2024;37:130265-130290.

nT4X and nT4M: Novel Time Non-reversible Mixture Amino Acid Substitution Models.nT4X和nT4M：新型时间不可逆混合氨基酸取代模型。

J Mol Evol. 2025 Feb;93(1):136-148. doi: 10.1007/s00239-024-10230-8. Epub 2025 Jan 20.

Data-specific substitution models improve protein-based phylogenetics.基于数据的替代模型可提高基于蛋白质的系统发育分析。

PeerJ. 2023 Aug 8;11:e15716. doi: 10.7717/peerj.15716. eCollection 2023.

nQMaker: Estimating Time Nonreversible Amino Acid Substitution Models.nQMaker：估计时间不可逆氨基酸替换模型。

Syst Biol. 2022 Aug 10;71(5):1110-1123. doi: 10.1093/sysbio/syac007.

MtOrt: an empirical mitochondrial amino acid substitution model for evolutionary studies of Orthoptera insects.MtOrt：一个用于直翅目昆虫进化研究的经验性线粒体氨基酸替换模型。

BMC Evol Biol. 2020 May 19;20(1):57. doi: 10.1186/s12862-020-01623-6.

mtProtEvol: the resource presenting molecular evolution analysis of proteins involved in the function of Vertebrate mitochondria.mtProtEvol：一个提供脊椎动物线粒体功能相关蛋白的分子进化分析的资源。

BMC Evol Biol. 2019 Feb 26;19(Suppl 1):47. doi: 10.1186/s12862-019-1371-x.

Improved mitochondrial amino acid substitution models for metazoan evolutionary studies.用于后生动物进化研究的改进线粒体氨基酸替代模型。

BMC Evol Biol. 2017 Jun 12;17(1):136. doi: 10.1186/s12862-017-0987-y.

本文引用的文献

Bioinformatics. 2011 Oct 1;27(19):2758-60. doi: 10.1093/bioinformatics/btr435. Epub 2011 Jul 26.

New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.新算法和方法估计最大似然系统发育：评估 PhyML 3.0 的性能。

Syst Biol. 2010 May;59(3):307-21. doi: 10.1093/sysbio/syq010. Epub 2010 Mar 29.

FLU, an amino acid substitution model for influenza proteins.流感，用于流感蛋白的氨基酸替代模型。

BMC Evol Biol. 2010 Apr 12;10:99. doi: 10.1186/1471-2148-10-99.

FastTree 2--approximately maximum-likelihood trees for large alignments.FastTree 2--用于大型比对的近似最大似然树。

PLoS One. 2010 Mar 10;5(3):e9490. doi: 10.1371/journal.pone.0009490.

Fast embedding methods for clustering tens of thousands of sequences.用于对成千上万条序列进行聚类的快速嵌入方法。

Comput Biol Chem. 2008 Aug;32(4):282-6. doi: 10.1016/j.compbiolchem.2008.03.005. Epub 2008 Mar 26.

Phylogeny.fr: robust phylogenetic analysis for the non-specialist.Phylogeny.fr：面向非专业人士的强大系统发育分析工具。

Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W465-9. doi: 10.1093/nar/gkn180. Epub 2008 Apr 19.

An improved general amino acid replacement matrix.一种改进的通用氨基酸置换矩阵。

Mol Biol Evol. 2008 Jul;25(7):1307-20. doi: 10.1093/molbev/msn067. Epub 2008 Mar 26.

XRate: a fast prototyping, training and annotation tool for phylo-grammars.XRate：一种用于系统发育语法的快速原型制作、训练和注释工具。

BMC Bioinformatics. 2006 Oct 3;7:428. doi: 10.1186/1471-2105-7-428.

Maximum likelihood of evolutionary trees: hardness and approximation.进化树的最大似然性：难度与近似性

Bioinformatics. 2005 Jun;21 Suppl 1:i97-106. doi: 10.1093/bioinformatics/bti1027.

RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees.RAxML-III：一个基于最大似然法推断大型系统发育树的快速程序。

Bioinformatics. 2005 Feb 15;21(4):456-63. doi: 10.1093/bioinformatics/bti191. Epub 2004 Dec 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

FastMG：一种简单、快速且准确的最大似然程序，用于从大型数据集中估计氨基酸替换率矩阵。

FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献