• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对氨基酸矩阵选择方法及其在经验数据上的应用进行评估表明,针对矩阵选择的临时假设是不合理的。

Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified.

作者信息

Keane Thomas M, Creevey Christopher J, Pentony Melissa M, Naughton Thomas J, Mclnerney James O

机构信息

Department of Biology, National University of Ireland, Maynooth, Co. Kildare, Ireland.

出版信息

BMC Evol Biol. 2006 Mar 24;6:29. doi: 10.1186/1471-2148-6-29.

DOI:10.1186/1471-2148-6-29
PMID:16563161
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1435933/
Abstract

BACKGROUND

In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner.

RESULTS

We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins.

CONCLUSION

This demonstrates that choosing protein models based on their source or method of construction may not be appropriate.

摘要

背景

近年来,诸如最大似然法等基于模型的方法已成为构建系统发育树的首选方法。许多作者已表明使用适当的替换模型对于生成准确的系统发育树的重要性。过去,许多氨基酸替换的经验模型是使用各种不同方法和蛋白质数据集推导出来的。这些矩阵通常用作替代物,而不是从所研究的数据集中推导最大似然模型。除了少数例外,在替代矩阵之间的选择一直是以临时的方式进行的。

结果

我们首先通过展示一个实证例子来突出任意选择蛋白质模型的潜在危险,在这个例子中,使用两种不同的任意选择的氨基酸替换模型,单个比对可以产生两个拓扑结构不同且得到有力支持的系统发育树。我们证明,在简单模拟中,模型选择的统计方法确实稳健,并且可能对蛋白质模型选择有用。我们研究了来自生命三个域的同源序列之间的氨基酸替换模式,我们的结果表明,对于任何数据集,没有单一的氨基酸矩阵是最优的。也许最有趣的是,我们证明,对于来自变形菌门和古菌的两个大型数据集,两个数据集中最受青睐的模型之一是最初从逆转录病毒Pol蛋白推导出来的模型。

结论

这表明基于蛋白质模型的来源或构建方法来选择可能不合适。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/e6a1c1aae573/1471-2148-6-29-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/3d34748d46d8/1471-2148-6-29-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/0cd7481954c5/1471-2148-6-29-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/3a77ba090d59/1471-2148-6-29-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/bd263a9c825c/1471-2148-6-29-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/7644ffa5de96/1471-2148-6-29-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/e6a1c1aae573/1471-2148-6-29-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/3d34748d46d8/1471-2148-6-29-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/0cd7481954c5/1471-2148-6-29-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/3a77ba090d59/1471-2148-6-29-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/bd263a9c825c/1471-2148-6-29-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/7644ffa5de96/1471-2148-6-29-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45f5/1435933/e6a1c1aae573/1471-2148-6-29-6.jpg

相似文献

1
Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified.对氨基酸矩阵选择方法及其在经验数据上的应用进行评估表明,针对矩阵选择的临时假设是不合理的。
BMC Evol Biol. 2006 Mar 24;6:29. doi: 10.1186/1471-2148-6-29.
2
Efficient methods for estimating amino acid replacement rates.估计氨基酸替换率的有效方法。
J Mol Evol. 2006 Jun;62(6):663-73. doi: 10.1007/s00239-004-0113-9. Epub 2006 Apr 28.
3
An amino acid substitution-selection model adjusts residue fitness to improve phylogenetic estimation.氨基酸替换选择模型调整残基适合度以改进系统发育估计。
Mol Biol Evol. 2014 Apr;31(4):779-92. doi: 10.1093/molbev/msu044. Epub 2014 Jan 16.
4
MtOrt: an empirical mitochondrial amino acid substitution model for evolutionary studies of Orthoptera insects.MtOrt:一个用于直翅目昆虫进化研究的经验性线粒体氨基酸替换模型。
BMC Evol Biol. 2020 May 19;20(1):57. doi: 10.1186/s12862-020-01623-6.
5
Empirical codon substitution matrix.经验密码子替换矩阵。
BMC Bioinformatics. 2005 Jun 1;6:134. doi: 10.1186/1471-2105-6-134.
6
Mega-phylogeny approach for comparative biology: an alternative to supertree and supermatrix approaches.用于比较生物学的巨系统发育方法:超级树和超级矩阵方法的替代方案。
BMC Evol Biol. 2009 Feb 11;9:37. doi: 10.1186/1471-2148-9-37.
7
Improved mitochondrial amino acid substitution models for metazoan evolutionary studies.用于后生动物进化研究的改进线粒体氨基酸替代模型。
BMC Evol Biol. 2017 Jun 12;17(1):136. doi: 10.1186/s12862-017-0987-y.
8
CodonPhyML: fast maximum likelihood phylogeny estimation under codon substitution models.CodonPhyML:基于密码子替换模型的快速最大似然系统发育估计。
Mol Biol Evol. 2013 Jun;30(6):1270-80. doi: 10.1093/molbev/mst034. Epub 2013 Feb 23.
9
Empirical models for substitution in ribosomal RNA.核糖体RNA中替代的经验模型。
Mol Biol Evol. 2004 Mar;21(3):419-27. doi: 10.1093/molbev/msh029. Epub 2003 Dec 5.
10
FastMG: a simple, fast, and accurate maximum likelihood procedure to estimate amino acid replacement rate matrices from large data sets.FastMG:一种简单、快速且准确的最大似然程序,用于从大型数据集中估计氨基酸替换率矩阵。
BMC Bioinformatics. 2014 Oct 24;15(1):341. doi: 10.1186/1471-2105-15-341.

引用本文的文献

1
Analysis of plant gene family heat shock protein 100 (HSP100) and its orthologs in Eukarya reveals sites of divergent evolution and insights into endosymbiotic origins of chloroplasts.植物基因家族热休克蛋白100(HSP100)及其在真核生物中的直系同源物分析揭示了趋异进化位点以及对叶绿体共生起源的见解。
Plant Signal Behav. 2025 Dec;20(1):2532008. doi: 10.1080/15592324.2025.2532008. Epub 2025 Jul 20.
2
Three New Species and a New Record of Arbuscular Mycorrhizal Fungi of the Genus Associated with Citrus from South China.来自中国南方与柑橘相关的球囊霉属丛枝菌根真菌的三个新物种及一个新记录种。
J Fungi (Basel). 2025 May 16;11(5):382. doi: 10.3390/jof11050382.
3

本文引用的文献

1
Success of Parsimony in the Four-Taxon Case: Long-Branch Repulsion by Likelihood in the Farris Zone.简约法在四分类单元情形下的成功:法里斯区域中似然性导致的长枝排斥
Cladistics. 1998 Sep;14(3):209-220. doi: 10.1111/j.1096-0031.1998.tb00334.x.
2
Accuracy of rate estimation using relaxed-clock models with a critical focus on the early metazoan radiation.使用宽松分子钟模型进行速率估计的准确性,重点关注早期后生动物的辐射演化。
Mol Biol Evol. 2005 May;22(5):1355-63. doi: 10.1093/molbev/msi125. Epub 2005 Mar 9.
3
Evaluating the performance of a successive-approximations approach to parameter optimization in maximum-likelihood phylogeny estimation.
Deciphering the Hantavirus Host Range Combining Virology and Species Distribution Models with an Emphasis on the Yellow Pygmy Rice Rat ().
结合病毒学和物种分布模型解读汉坦病毒宿主范围,重点关注黄毛侏儒稻鼠( )。
Transbound Emerg Dis. 2023 Apr 19;2023:2730050. doi: 10.1155/2023/2730050. eCollection 2023.
4
Phylogeography Analysis Reveals Rabies Epidemiology, Evolution, and Transmission in the Philippines.系统发育地理学分析揭示了菲律宾狂犬病的流行病学、进化及传播情况。
Mol Biol Evol. 2025 Feb 3;42(2). doi: 10.1093/molbev/msaf007.
5
High astrovirus diversity in an endemic bat species suggests multiple spillovers from synanthropic rodents and birds.一种地方性蝙蝠物种中高星状病毒多样性表明,该病毒多次从伴人啮齿动物和鸟类传播而来。
J Virol. 2025 Feb 25;99(2):e0135724. doi: 10.1128/jvi.01357-24. Epub 2025 Jan 22.
6
Morphological . molecular identification of trematode species infecting the edible cockle across Europe.欧洲食用鸟蛤体内寄生吸虫种类的形态学和分子鉴定
Int J Parasitol Parasites Wildl. 2024 Nov 14;25:101019. doi: 10.1016/j.ijppaw.2024.101019. eCollection 2024 Dec.
7
Intergrative Taxonomic Study of the Complex with a Modern Circumscription of the Section (Frullaniaceae, Marchantiphyta).对具有现代界定的叶苔科叶苔属复合体的综合分类学研究(叶苔科,地钱门)
Plants (Basel). 2024 Aug 27;13(17):2397. doi: 10.3390/plants13172397.
8
Mapping disparities in viral infection rates using highly multiplexed serology.利用高度多重化血清学技术绘制病毒感染率的差异图谱。
mSphere. 2024 Sep 25;9(9):e0012724. doi: 10.1128/msphere.00127-24. Epub 2024 Aug 20.
9
Phylogenomic analyses of all species of swordtail fishes (genus Xiphophorus) show that hybridization preceded speciation.系统基因组分析表明,所有剑尾鱼物种(属 Xiphophorus)的杂交发生在物种形成之前。
Nat Commun. 2024 Aug 4;15(1):6609. doi: 10.1038/s41467-024-50852-6.
10
Ancient Mitochondrial Genomes Provide New Clues in the History of the Akhal-Teke Horse in China.中国古代线粒体基因组为阿哈尔捷金马的历史提供了新线索。
Genes (Basel). 2024 Jun 15;15(6):790. doi: 10.3390/genes15060790.
评估在最大似然系统发育估计中用于参数优化的逐次逼近法的性能。
Mol Biol Evol. 2005 Jun;22(6):1386-92. doi: 10.1093/molbev/msi129. Epub 2005 Mar 9.
4
The Opisthokonta and the Ecdysozoa may not be clades: stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa.后鞭毛生物总门和蜕皮动物总门可能并非进化枝:相较于动物与真菌的组合,植物与动物的组合获得了更强的支持;相较于蜕皮动物总门,假体腔动物获得了更强的支持。
Mol Biol Evol. 2005 May;22(5):1175-84. doi: 10.1093/molbev/msi102. Epub 2005 Feb 9.
5
ProtTest: selection of best-fit models of protein evolution.ProtTest:蛋白质进化最佳拟合模型的选择。
Bioinformatics. 2005 May 1;21(9):2104-5. doi: 10.1093/bioinformatics/bti263. Epub 2005 Jan 12.
6
Accounting for uncertainty in the tree topology has little effect on the decision-theoretic approach to model selection in phylogeny estimation.在系统发育估计中,考虑树形拓扑结构的不确定性对模型选择的决策理论方法影响甚微。
Mol Biol Evol. 2005 Mar;22(3):691-703. doi: 10.1093/molbev/msi050. Epub 2004 Nov 17.
7
Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests.系统发育学中的模型选择与模型平均:赤池信息准则和贝叶斯方法相对于似然比检验的优势
Syst Biol. 2004 Oct;53(5):793-808. doi: 10.1080/10635150490522304.
8
Modeling compositional heterogeneity.对成分异质性进行建模。
Syst Biol. 2004 Jun;53(3):485-95. doi: 10.1080/10635150490445779.
9
Different versions of the Dayhoff rate matrix.Dayhoff速率矩阵的不同版本。
Mol Biol Evol. 2005 Feb;22(2):193-9. doi: 10.1093/molbev/msi005. Epub 2004 Oct 13.
10
Clann: investigating phylogenetic information through supertree analyses.克兰:通过超树分析研究系统发育信息。
Bioinformatics. 2005 Feb 1;21(3):390-2. doi: 10.1093/bioinformatics/bti020. Epub 2004 Sep 16.