简化氨基酸字母表在折叠分配中表现出更高的灵敏度和选择性。

Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.

作者信息

Peterson Eric L, Kondev Jané, Theriot Julie A, Phillips Rob

机构信息

Department of Physics, California Institute of Technology, Pasadena, CA 91125, USA.

出版信息

Bioinformatics. 2009 Jun 1;25(11):1356-62. doi: 10.1093/bioinformatics/btp164. Epub 2009 Apr 7.

DOI:10.1093/bioinformatics/btp164

PMID:19351620

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2732308/

Abstract

MOTIVATION

Many proteins with vastly dissimilar sequences are found to share a common fold, as evidenced in the wealth of structures now available in the Protein Data Bank. One idea that has found success in various applications is the concept of a reduced amino acid alphabet, wherein similar amino acids are clustered together. Given the structural similarity exhibited by many apparently dissimilar sequences, we undertook this study looking for improvements in fold recognition by comparing protein sequences written in a reduced alphabet.

RESULTS

We tested over 150 of the amino acid clustering schemes proposed in the literature with all-versus-all pairwise sequence alignments of sequences in the Distance mAtrix aLIgnment database. We combined several metrics from information retrieval popular in the literature: mean precision, area under the Receiver Operating Characteristic curve and recall at a fixed error rate and found that, in contrast to previous work, reduced alphabets in many cases outperform full alphabets. We find that reduced alphabets can perform at a level comparable to full alphabets in correct pairwise alignment of sequences and can show increased sensitivity to pairs of sequences with structural similarity but low-sequence identity. Based on these results, we hypothesize that reduced alphabets may also show performance gains with more sophisticated methods such as profile and pattern searches.

AVAILABILITY

A table of results as well as the substitution matrices and residue groupings from this study can be downloaded from (http://www.rpgroup.caltech.edu/publications/supplements/alphabets).

摘要

动机

正如蛋白质数据库中现有的大量结构所证明的那样，许多序列差异极大的蛋白质却具有共同的折叠结构。在各种应用中取得成功的一个想法是简化氨基酸字母表的概念，即将相似的氨基酸聚类在一起。鉴于许多明显不同的序列所表现出的结构相似性，我们开展了这项研究，通过比较用简化字母表编写的蛋白质序列来寻找折叠识别方面的改进。

结果

我们用距离矩阵比对数据库中的序列进行全对全成对序列比对，测试了文献中提出的150多种氨基酸聚类方案。我们结合了文献中流行的几种信息检索指标：平均精度、接收器操作特征曲线下的面积以及固定错误率下的召回率，发现与之前的工作相反，在许多情况下简化字母表的表现优于完整字母表。我们发现，在序列的正确成对比对中，简化字母表的表现可与完整字母表相媲美，并且对具有结构相似性但序列同一性较低的序列对表现出更高的敏感性。基于这些结果，我们推测，对于更复杂的方法，如轮廓和模式搜索，简化字母表可能也会有性能提升。

可用性

可从（http://www.rpgroup.caltech.edu/publications/supplements/alphabets）下载本研究的结果表以及替换矩阵和残基分组。

相似文献

Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.

Bioinformatics. 2009 Jun 1;25(11):1356-62. doi: 10.1093/bioinformatics/btp164. Epub 2009 Apr 7.

Automated alphabet reduction for protein datasets.

BMC Bioinformatics. 2009 Jan 6;10:6. doi: 10.1186/1471-2105-10-6.

Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets.

Proteins. 2006 Jun 1;63(4):986-95. doi: 10.1002/prot.20881.

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids.

Sci China C Life Sci. 2007 Jun;50(3):392-402. doi: 10.1007/s11427-007-0023-3.

Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices.

Bioinformatics. 2002 Aug;18(8):1102-8. doi: 10.1093/bioinformatics/18.8.1102.

Reduction of protein sequence complexity by residue grouping.

Protein Eng. 2003 May;16(5):323-30. doi: 10.1093/protein/gzg044.

Simplified amino acid alphabets for protein fold recognition and implications for folding.

Protein Eng. 2000 Mar;13(3):149-52. doi: 10.1093/protein/13.3.149.

Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet.

Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W119-23. doi: 10.1093/nar/gkl199.

Detailed protein sequence alignment based on Spectral Similarity Score (SSS).

BMC Bioinformatics. 2005 Apr 23;6:105. doi: 10.1186/1471-2105-6-105.

Adjusting scoring matrices to correct overextended alignments.

Bioinformatics. 2013 Dec 1;29(23):3007-13. doi: 10.1093/bioinformatics/btt517. Epub 2013 Aug 31.

引用本文的文献

Bag-of-words is competitive with sum-of-embeddings language-inspired representations on protein inference.

PLoS One. 2025 Aug 6;20(8):e0325531. doi: 10.1371/journal.pone.0325531. eCollection 2025.

Limits on inferring T cell specificity from partial information.

Proc Natl Acad Sci U S A. 2024 Oct 15;121(42):e2408696121. doi: 10.1073/pnas.2408696121. Epub 2024 Oct 7.

Discovery of antimicrobial peptides in the global microbiome with machine learning.

Cell. 2024 Jul 11;187(14):3761-3778.e16. doi: 10.1016/j.cell.2024.05.013. Epub 2024 Jun 5.

Accurately identifying hemagglutinin using sequence information and machine learning methods.

Front Med (Lausanne). 2023 Oct 31;10:1281880. doi: 10.3389/fmed.2023.1281880. eCollection 2023.

Computational exploration of the global microbiome for antibiotic discovery.

bioRxiv. 2023 Sep 11:2023.08.31.555663. doi: 10.1101/2023.08.31.555663.

A Tale of Loops and Tails: The Role of Intrinsically Disordered Protein Regions in R-Loop Recognition and Phase Separation.

Front Mol Biosci. 2021 Jun 10;8:691694. doi: 10.3389/fmolb.2021.691694. eCollection 2021.

The language of proteins: NLP, machine learning & protein sequences.

Comput Struct Biotechnol J. 2021 Mar 25;19:1750-1758. doi: 10.1016/j.csbj.2021.03.022. eCollection 2021.

Evolution as a Guide to Designing Amino Acid Alphabets.

Int J Mol Sci. 2021 Mar 10;22(6):2787. doi: 10.3390/ijms22062787.

A Simplified Amino Acidic Alphabet to Unveil the T-Cells Receptors Antigens: A Computational Perspective.

Front Chem. 2021 Feb 25;9:598802. doi: 10.3389/fchem.2021.598802. eCollection 2021.

dagLogo: An R/Bioconductor package for identifying and visualizing differential amino acid group usage in proteomics data.

PLoS One. 2020 Nov 6;15(11):e0242030. doi: 10.1371/journal.pone.0242030. eCollection 2020.

本文引用的文献

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids.

Sci China C Life Sci. 2007 Jun;50(3):392-402. doi: 10.1007/s11427-007-0023-3.

Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets.

Proteins. 2006 Jun 1;63(4):986-95. doi: 10.1002/prot.20881.

Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap.

Bioinformatics. 2005 Oct 15;21(20):3824-31. doi: 10.1093/bioinformatics/bti627. Epub 2005 Aug 16.

Does common architecture reveal a viral lineage spanning all three domains of life?

Mol Cell. 2004 Dec 3;16(5):673-85. doi: 10.1016/j.molcel.2004.11.016.

De novo proteins from designed combinatorial libraries.

Protein Sci. 2004 Jul;13(7):1711-23. doi: 10.1110/ps.04690804.

Progress towards mapping the universe of protein folds.

Genome Biol. 2004;5(5):107. doi: 10.1186/gb-2004-5-5-107. Epub 2004 Apr 29.

A cell-based screen for function of the four-helix bundle protein Rop: a new tool for combinatorial experiments in biophysics.

Protein Eng Des Sel. 2004 Jan;17(1):77-83. doi: 10.1093/protein/gzh010.

Reduction of protein sequence complexity by residue grouping.

Protein Eng. 2003 May;16(5):323-30. doi: 10.1093/protein/gzg044.

What is the minimum number of letters required to fold a protein?

J Mol Biol. 2003 May 9;328(4):921-6. doi: 10.1016/s0022-2836(03)00324-3.

Simplified amino acid alphabets based on deviation of conditional probability from random background.

Phys Rev E Stat Nonlin Soft Matter Phys. 2002 Aug;66(2 Pt 1):021906. doi: 10.1103/PhysRevE.66.021906. Epub 2002 Aug 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

简化氨基酸字母表在折叠分配中表现出更高的灵敏度和选择性。

Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.

作者信息

Peterson Eric L, Kondev Jané, Theriot Julie A, Phillips Rob

机构信息

Department of Physics, California Institute of Technology, Pasadena, CA 91125, USA.

出版信息

Bioinformatics. 2009 Jun 1;25(11):1356-62. doi: 10.1093/bioinformatics/btp164. Epub 2009 Apr 7.

DOI:10.1093/bioinformatics/btp164

PMID:19351620

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2732308/

Abstract

MOTIVATION

RESULTS

AVAILABILITY

A table of results as well as the substitution matrices and residue groupings from this study can be downloaded from (http://www.rpgroup.caltech.edu/publications/supplements/alphabets).

摘要

动机

结果

可用性

可从（http://www.rpgroup.caltech.edu/publications/supplements/alphabets）下载本研究的结果表以及替换矩阵和残基分组。

简化氨基酸字母表在折叠分配中表现出更高的灵敏度和选择性。

Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

简化氨基酸字母表在折叠分配中表现出更高的灵敏度和选择性。

Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性