• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种改进的通用氨基酸置换矩阵。

An improved general amino acid replacement matrix.

作者信息

Le Si Quang, Gascuel Olivier

机构信息

Méthodes et Algorithmes pour la Bioinformatique, LIRMM, CNRS, Université Montpellier II, Montpellier, France.

出版信息

Mol Biol Evol. 2008 Jul;25(7):1307-20. doi: 10.1093/molbev/msn067. Epub 2008 Mar 26.

DOI:10.1093/molbev/msn067
PMID:18367465
Abstract

Amino acid replacement matrices are an essential basis of protein phylogenetics. They are used to compute substitution probabilities along phylogeny branches and thus the likelihood of the data. They are also essential in protein alignment. A number of replacement matrices and methods to estimate these matrices from protein alignments have been proposed since the seminal work of Dayhoff et al. (1972). An important advance was achieved by Whelan and Goldman (2001) and their WAG matrix, thanks to an efficient maximum likelihood estimation approach that accounts for the phylogenies of sequences within each training alignment. We further refine this method by incorporating the variability of evolutionary rates across sites in the matrix estimation and using a much larger and diverse database than BRKALN, which was used to estimate WAG. To estimate our new matrix (called LG after the authors), we use an adaptation of the XRATE software and 3,912 alignments from Pfam, comprising approximately 50,000 sequences and approximately 6.5 million residues overall. To evaluate the LG performance, we use an independent sample consisting of 59 alignments from TreeBase and randomly divide Pfam alignments into 3,412 training and 500 test alignments. The comparison with WAG and JTT shows a clear likelihood improvement. With TreeBase, we find that 1) the average Akaike information criterion gain per site is 0.25 and 0.42, when compared with WAG and JTT, respectively; 2) LG is significantly better than WAG for 38 alignments (among 59), and significantly worse with 2 alignments only; and 3) tree topologies inferred with LG, WAG, and JTT frequently differ, indicating that using LG impacts not only the likelihood value but also the output tree. Results with the test alignments from Pfam are analogous. LG and a PHYML implementation can be downloaded from http://atgc.lirmm.fr/LG.

摘要

氨基酸替换矩阵是蛋白质系统发育学的重要基础。它们用于计算沿系统发育分支的替换概率,进而计算数据的似然性。它们在蛋白质比对中也至关重要。自Dayhoff等人(1972年)的开创性工作以来,已经提出了许多替换矩阵以及从蛋白质比对中估计这些矩阵的方法。Whelan和Goldman(2001年)及其WAG矩阵取得了一项重要进展,这得益于一种有效的最大似然估计方法,该方法考虑了每个训练比对中序列的系统发育。我们通过在矩阵估计中纳入位点间进化速率的变异性,并使用比用于估计WAG的BRKALN大得多且更多样化的数据库,进一步完善了该方法。为了估计我们的新矩阵(以作者名字命名为LG),我们使用了XRATE软件的一个改编版本以及来自Pfam的3912个比对,总共包含约50000个序列和约650万个残基。为了评估LG的性能,我们使用了一个由来自TreeBase的59个比对组成的独立样本,并将Pfam比对随机分为3412个训练比对和500个测试比对。与WAG和JTT的比较显示出似然性有明显提高。使用TreeBase,我们发现:一是与WAG和JTT相比,每个位点的平均赤池信息准则增益分别为0.25和0.42;二是在59个比对中,LG在38个比对上显著优于WAG,仅在2个比对上显著更差;三是用LG、WAG和JTT推断的树拓扑结构经常不同,这表明使用LG不仅会影响似然值,还会影响输出树。来自Pfam测试比对的结果类似。LG和一个PHYML实现版本可从http://atgc.lirmm.fr/LG下载。

相似文献

1
An improved general amino acid replacement matrix.一种改进的通用氨基酸置换矩阵。
Mol Biol Evol. 2008 Jul;25(7):1307-20. doi: 10.1093/molbev/msn067. Epub 2008 Mar 26.
2
Phylogenetic mixture models for proteins.蛋白质的系统发育混合模型
Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3965-76. doi: 10.1098/rstb.2008.0180.
3
ReplacementMatrix: a web server for maximum-likelihood estimation of amino acid replacement rate matrices.ReplacementMatrix:一个用于最大似然估计氨基酸替换率矩阵的网络服务器。
Bioinformatics. 2011 Oct 1;27(19):2758-60. doi: 10.1093/bioinformatics/btr435. Epub 2011 Jul 26.
4
Accounting for solvent accessibility and secondary structure in protein phylogenetics is clearly beneficial.在蛋白质系统发生学中考虑溶剂可及性和二级结构显然是有益的。
Syst Biol. 2010 May;59(3):277-87. doi: 10.1093/sysbio/syq002. Epub 2010 Mar 10.
5
Pandit: a database of protein and associated nucleotide domains with inferred trees.潘迪特:一个带有推断树的蛋白质及相关核苷酸结构域数据库。
Bioinformatics. 2003 Aug 12;19(12):1556-63. doi: 10.1093/bioinformatics/btg188.
6
A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny.一种根据特定位点氨基酸频率进行调整并改进蛋白质系统发育推断的类频率混合模型。
BMC Evol Biol. 2008 Dec 16;8:331. doi: 10.1186/1471-2148-8-331.
7
Scoredist: a simple and robust protein sequence distance estimator.Scoredist:一种简单且强大的蛋白质序列距离估计器。
BMC Bioinformatics. 2005 Apr 27;6:108. doi: 10.1186/1471-2105-6-108.
8
A collection of amino acid replacement matrices derived from clusters of orthologs.一组源自直系同源簇的氨基酸替换矩阵。
J Mol Evol. 2005 Nov;61(5):659-65. doi: 10.1007/s00239-005-0060-0. Epub 2005 Oct 20.
9
libcov: a C++ bioinformatic library to manipulate protein structures, sequence alignments and phylogeny.Libcov:一个用于处理蛋白质结构、序列比对和系统发育的C++生物信息学库。
BMC Bioinformatics. 2005 Jun 6;6:138. doi: 10.1186/1471-2105-6-138.
10
QMaker: Fast and Accurate Method to Estimate Empirical Models of Protein Evolution.QMaker:一种快速准确的蛋白质进化经验模型估计方法。
Syst Biol. 2021 Aug 11;70(5):1046-1060. doi: 10.1093/sysbio/syab010.

引用本文的文献

1
The Pathophysiological Functions of Heparanases: From Evolution, Structural and Tissue-Specific Perspectives.乙酰肝素酶的病理生理功能:从进化、结构和组织特异性角度探讨
FASEB J. 2025 Sep 15;39(17):e70976. doi: 10.1096/fj.202501859R.
2
Diversity of RNA Viruses and Circular Viroid-like Elements in spp. in Near-Natural Forests of Bosnia and Herzegovina.波斯尼亚和黑塞哥维那近自然森林中 spp. 里RNA病毒和类环状病毒样元件的多样性。
Viruses. 2025 Aug 20;17(8):1144. doi: 10.3390/v17081144.
3
Evolutionary analysis reveals the origin of sodium coupling in glutamate transporters.
进化分析揭示了谷氨酸转运体中钠偶联的起源。
Nat Struct Mol Biol. 2025 Aug 25. doi: 10.1038/s41594-025-01652-z.
4
Convergent Evolution in Amblyopsid Cavefishes and the Age of Eastern North American Subterranean Ecosystems.盲视洞穴鱼的趋同进化与北美东部地下生态系统的年代
Mol Biol Evol. 2025 Jul 30;42(8). doi: 10.1093/molbev/msaf185.
5
Intron turnover of slc26a1 and slc26a2 and convergence of intron insertion sites.溶质载体家族26成员1(slc26a1)和溶质载体家族26成员2(slc26a2)的内含子周转及内含子插入位点的趋同
Sci Rep. 2025 Aug 16;15(1):30007. doi: 10.1038/s41598-025-15147-w.
6
Future Sequon Finder - A novel approach for predicting future N-linked glycosylation sequon locations on viral surface proteins.未来糖基化位点查找器——一种预测病毒表面蛋白上未来N-糖基化位点位置的新方法。
PLoS One. 2025 Aug 14;20(8):e0328174. doi: 10.1371/journal.pone.0328174. eCollection 2025.
7
Reduced Amino Acid Substitution Matrices Find Traces of Ancient Coding Alphabets in Modern Day Proteins.简化氨基酸替换矩阵在现代蛋白质中发现古代编码字母表的痕迹。
Mol Biol Evol. 2025 Sep 1;42(9). doi: 10.1093/molbev/msaf197.
8
Hydrogen Oxidation Benefits Alphaproteobacterial Methanotrophs Under Severe Methane Limitation.在严重甲烷限制条件下,氢氧化作用对α-变形菌纲甲烷营养菌有益。
Environ Microbiol. 2025 Aug;27(8):e70163. doi: 10.1111/1462-2920.70163.
9
Universal orthologs infer deep phylogenies and improve genome quality assessments.通用直系同源基因可推断深层系统发育并改善基因组质量评估。
BMC Biol. 2025 Jul 28;23(1):224. doi: 10.1186/s12915-025-02328-2.
10
Transcriptome analysis of Spodoptera RNA-seq data unveils new viruses within the family Rhabdoviridae.斜纹夜蛾RNA测序数据的转录组分析揭示了弹状病毒科内的新病毒。
Virus Genes. 2025 Jul 27. doi: 10.1007/s11262-025-02177-9.