使用成分调整替代矩阵进行蛋白质数据库搜索。

Protein database searches using compositionally adjusted substitution matrices.

作者信息

Altschul Stephen F, Wootton John C, Gertz E Michael, Agarwala Richa, Morgulis Aleksandr, Schäffer Alejandro A, Yu Yi-Kuo

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

FEBS J. 2005 Oct;272(20):5101-9. doi: 10.1111/j.1742-4658.2005.04945.x.

DOI:10.1111/j.1742-4658.2005.04945.x

PMID:16218944

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1343503/

Abstract

Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long-standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches, in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately, we describe several simple criteria under which invoking such adjustment is on average beneficial. In a typical database search, at least one of these criteria is satisfied by over half the related sequence pairs. Compositional substitution matrix adjustment is now available in NCBI's protein-protein version of blast.

摘要

几乎所有蛋白质数据库搜索方法都使用氨基酸替换矩阵来进行序列比对的评分、优化及统计显著性评估。因此，构建替换矩阵投入了大量的精力，搜索结果的质量在很大程度上取决于合适矩阵的选择。长期存在的一个问题是具有偏向性氨基酸组成的序列之间的比较，对于这类序列，标准替换矩阵并非最优选择。为解决这一问题，我们最近开发了一种通用方法，可将标准矩阵转换为适用于比较具有任意组成（可能不同）的两个序列的矩阵。当应用于具有明显偏向性组成的蛋白质比较时，这种经过调整的矩阵平均能产生更好的比对和比对得分。在此，我们回顾了成分调整矩阵的应用，并探讨它们是否也能有效地应用于通用蛋白质序列数据库搜索，在这类搜索中相关序列对不一定具有很强的组成偏向性。虽然不加区分地应用成分调整并不可取，但我们描述了几个简单的标准，在这些标准下进行这种调整平均而言是有益的。在典型的数据库搜索中，超过半数的相关序列对至少满足其中一个标准。成分替换矩阵调整现已在NCBI的蛋白质-蛋白质版本的Blast中可用。

相似文献

Protein database searches using compositionally adjusted substitution matrices.使用成分调整替代矩阵进行蛋白质数据库搜索。

FEBS J. 2005 Oct;272(20):5101-9. doi: 10.1111/j.1742-4658.2005.04945.x.

The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions.用于比较具有非标准组成的蛋白质的氨基酸替换矩阵的构建。

Bioinformatics. 2005 Apr 1;21(7):902-11. doi: 10.1093/bioinformatics/bti070. Epub 2004 Oct 27.

The compositional adjustment of amino acid substitution matrices.氨基酸替换矩阵的组成调整。

Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15688-93. doi: 10.1073/pnas.2533904100. Epub 2003 Dec 8.

OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.OXBench：一种用于评估蛋白质多序列比对准确性的基准。

BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47.

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids.PR2ALIGN：一个用于利用氨基酸加权生化特性进行蛋白质序列比对的独立软件程序和网络服务器。

BMC Res Notes. 2015 May 7;8:187. doi: 10.1186/s13104-015-1152-6.

Pairwise statistical significance of local sequence alignment using sequence-specific and position-specific substitution matrices.使用序列特异性和位置特异性取代矩阵进行局部序列比对的成对统计显著性。

IEEE/ACM Trans Comput Biol Bioinform. 2011 Jan-Mar;8(1):194-205. doi: 10.1109/TCBB.2009.69.

Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.用于比对远缘相关蛋白质序列的基于统计势的氨基酸相似性矩阵。

Proteins. 2006 Aug 15;64(3):587-600. doi: 10.1002/prot.21020.

Substitution scoring matrices for proteins - An overview.蛋白质替换评分矩阵——概述。

Protein Sci. 2020 Nov;29(11):2150-2163. doi: 10.1002/pro.3954. Epub 2020 Oct 12.

A novel series of compositionally biased substitution matrices for comparing Plasmodium proteins.用于比较疟原虫蛋白质的一系列新型成分偏倚替换矩阵。

BMC Bioinformatics. 2008 May 16;9:236. doi: 10.1186/1471-2105-9-236.

Context-specific amino acid substitution matrices and their use in the detection of protein homologs.特定上下文氨基酸替换矩阵及其在蛋白质同源物检测中的应用。

Proteins. 2008 May 1;71(2):910-9. doi: 10.1002/prot.21775.

引用本文的文献

Accurate detection of tandem repeats exposes ubiquitous reuse of biological sequences.串联重复序列的准确检测揭示了生物序列的普遍重用。

Nucleic Acids Res. 2025 Sep 5;53(17). doi: 10.1093/nar/gkaf866.

Crystal structure of the fungal mannosyltransferase Och1 reveals active site primed for N-glycan binding.真菌甘露糖基转移酶Och1的晶体结构揭示了为N-聚糖结合做好准备的活性位点。

PLoS One. 2025 Jul 31;20(7):e0329259. doi: 10.1371/journal.pone.0329259. eCollection 2025.

A single point mutation is sufficient to drive -dependent biofilm formation and promote colonization by .单个点突变足以驱动依赖于……的生物膜形成并促进……的定殖。（注：原文中部分内容缺失，翻译可能不太完整准确）

J Bacteriol. 2025 Aug 21;207(8):e0013125. doi: 10.1128/jb.00131-25. Epub 2025 Jul 14.

Functional study of Phaeodactylum tricornutum Seipin highlights specificities of lipid droplets biogenesis in diatoms.三角褐指藻Seipin的功能研究突出了硅藻中脂滴生物合成的特异性。

New Phytol. 2025 Sep;247(5):2245-2269. doi: 10.1111/nph.70350. Epub 2025 Jul 7.

In silico characterization, structural modeling, and molecular docking of GabP in citrus and its potential role in GABA uptake.柑橘中GabP的计算机模拟表征、结构建模及分子对接及其在γ-氨基丁酸摄取中的潜在作用

Sci Rep. 2025 Jul 4;15(1):23919. doi: 10.1038/s41598-025-07447-y.

Borrelia surface proteins: new horizons in Lyme disease diagnosis.疏螺旋体表面蛋白：莱姆病诊断的新视野

Appl Microbiol Biotechnol. 2025 Jul 1;109(1):156. doi: 10.1007/s00253-025-13490-6.

Identifying and characterizing a missing peroxin-PEX8-in Arabidopsis thaliana.在拟南芥中鉴定和表征一种缺失的过氧化物酶体蛋白——PEX8

Plant Cell. 2025 Jul 1;37(7). doi: 10.1093/plcell/koaf166.

A novel non-catalytic function of PA2803-encoded PcrP contributes to polymyxin B resistance in and redefines the functional role of the PA2803 subfamily.由PA2803编码的PcrP的一种新型非催化功能有助于多黏菌素B耐药性，并重新定义了PA2803亚家族的功能作用。

bioRxiv. 2025 May 13:2025.05.13.653872. doi: 10.1101/2025.05.13.653872.

Diversity and enzymatic activity of Polish beehive products microbiota, and characterization of a novel β-galactosidase from Paenibacillus sp. 8.波兰蜂箱产品微生物群的多样性和酶活性，以及来自芽孢杆菌属8号菌株的新型β-半乳糖苷酶的特性

Sci Rep. 2025 May 21;15(1):17625. doi: 10.1038/s41598-025-02561-3.

Ornithine enhances common bean growth and defense against white mold disease via interfering with and diminishing the biosynthesis of oxalic acid in .鸟氨酸通过干扰和减少草酸的生物合成来促进普通菜豆生长并抵御白霉病。

Front Plant Sci. 2025 Apr 4;16:1483417. doi: 10.3389/fpls.2025.1483417. eCollection 2025.

本文引用的文献

Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching.使用受试者工作特征（ROC）分析来评估序列匹配。

Comput Chem. 1996 Mar;20(1):25-33. doi: 10.1016/s0097-8485(96)80004-0.

An alternative model of amino acid replacement.氨基酸替代的另一种模型。

Bioinformatics. 2005 Apr 1;21(7):975-80. doi: 10.1093/bioinformatics/bti109. Epub 2004 Nov 5.

The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions.用于比较具有非标准组成的蛋白质的氨基酸替换矩阵的构建。

Bioinformatics. 2005 Apr 1;21(7):902-11. doi: 10.1093/bioinformatics/bti070. Epub 2004 Oct 27.

The compositional adjustment of amino acid substitution matrices.氨基酸替换矩阵的组成调整。

Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15688-93. doi: 10.1073/pnas.2533904100. Epub 2003 Dec 8.

ASTRAL compendium enhancements.ASTRAL汇编增强功能。

Nucleic Acids Res. 2002 Jan 1;30(1):260-3. doi: 10.1093/nar/30.1.260.

Non-symmetric score matrices and the detection of homologous transmembrane proteins.非对称评分矩阵与同源跨膜蛋白的检测

Bioinformatics. 2001;17 Suppl 1:S182-9. doi: 10.1093/bioinformatics/17.suppl_1.s182.

Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements.利用基于组成的统计方法和其他改进措施提高PSI-BLAST蛋白质数据库搜索的准确性。

Nucleic Acids Res. 2001 Jul 15;29(14):2994-3005. doi: 10.1093/nar/29.14.2994.

Modeling amino acid replacement.模拟氨基酸替换。

J Comput Biol. 2000;7(6):761-76. doi: 10.1089/10665270050514918.

The estimation of statistical parameters for local alignment score distributions.局部比对得分分布的统计参数估计。

Nucleic Acids Res. 2001 Jan 15;29(2):351-61. doi: 10.1093/nar/29.2.351.

PHAT: a transmembrane-specific substitution matrix. Predicted hydrophobic and transmembrane.PHAT：一种跨膜特异性替代矩阵。预测的疏水性和跨膜性。

Bioinformatics. 2000 Sep;16(9):760-6. doi: 10.1093/bioinformatics/16.9.760.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。