• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

氨基酸编码方法在蛋白质序列中的应用:全面综述与评估。

Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1918-1931. doi: 10.1109/TCBB.2019.2911677. Epub 2020 Dec 8.

DOI:10.1109/TCBB.2019.2911677
PMID:30998480
Abstract

As the first step of machine-learning based protein structure and function prediction, the amino acid encoding play a fundamental role in the final success of those methods. Different from the protein sequence encoding, the amino acid encoding can be used in both residue-level and sequence-level prediction of protein properties by combining them with different algorithms. However, it has not attracted enough attention in the past decades, and there are no comprehensive reviews and assessments about encoding methods so far. In this article, we make a systematic classification and propose a comprehensive review and assessment for various amino acid encoding methods. Those methods are grouped into five categories according to their information sources and information extraction methodologies, including binary encoding, physicochemical properties encoding, evolution-based encoding, structure-based encoding, and machine-learning encoding. Then, 16 representative methods from five categories are selected and compared on protein secondary structure prediction and protein fold recognition tasks by using large-scale benchmark datasets. The results show that the evolution-based position-dependent encoding method PSSM achieved the best performance, and the structure-based and machine-learning encoding methods also show some potential for further application, the neural network based distributed representation of amino acids in particular may bring new light to this area. We hope that the review and assessment are useful for future studies in amino acid encoding.

摘要

作为基于机器学习的蛋白质结构和功能预测的第一步,氨基酸编码在这些方法的最终成功中起着至关重要的作用。与蛋白质序列编码不同,氨基酸编码可以通过与不同的算法结合,应用于残基水平和序列水平的蛋白质性质预测。然而,在过去的几十年中,它并没有引起足够的重视,到目前为止,还没有关于编码方法的全面综述和评估。在本文中,我们进行了系统的分类,并对各种氨基酸编码方法进行了全面的综述和评估。这些方法根据其信息来源和信息提取方法被分为五类,包括二进制编码、理化性质编码、基于进化的编码、基于结构的编码和基于机器学习的编码。然后,我们从五类中选择了 16 种有代表性的方法,并使用大规模的基准数据集在蛋白质二级结构预测和蛋白质折叠识别任务上进行了比较。结果表明,基于进化的位置相关编码方法 PSSM 表现出了最好的性能,基于结构和基于机器学习的编码方法也显示出了进一步应用的潜力,特别是基于神经网络的氨基酸分布式表示方法可能为这一领域带来新的曙光。我们希望本综述和评估对未来的氨基酸编码研究有用。

相似文献

1
Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment.氨基酸编码方法在蛋白质序列中的应用:全面综述与评估。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1918-1931. doi: 10.1109/TCBB.2019.2911677. Epub 2020 Dec 8.
2
PseAAC2Vec protein encoding for TCR protein sequence classification.用于 TCR 蛋白序列分类的 PseAAC2Vec 蛋白编码。
Comput Biol Med. 2024 Mar;170:107956. doi: 10.1016/j.compbiomed.2024.107956. Epub 2024 Jan 4.
3
Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。
BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.
4
Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.基于机器学习的蛋白质-RNA 界面残基预测:现状评估。
BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.
5
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
6
SeqRate: sequence-based protein folding type classification and rates prediction.SeqRate:基于序列的蛋白质折叠类型分类和速率预测。
BMC Bioinformatics. 2010 Apr 29;11 Suppl 3(Suppl 3):S1. doi: 10.1186/1471-2105-11-S3-S1.
7
Amino acid "little Big Bang": representing amino acid substitution matrices as dot products of Euclidian vectors.氨基酸“小大爆炸”:将氨基酸替换矩阵表示为欧几里得向量的点积。
BMC Bioinformatics. 2010 Jan 4;11:4. doi: 10.1186/1471-2105-11-4.
8
Predicting RNA-binding sites of proteins using support vector machines and evolutionary information.使用支持向量机和进化信息预测蛋白质的RNA结合位点。
BMC Bioinformatics. 2008 Dec 12;9 Suppl 12(Suppl 12):S6. doi: 10.1186/1471-2105-9-S12-S6.
9
A machine learning based method for the prediction of secretory proteins using amino acid composition, their order and similarity-search.一种基于机器学习的方法,利用氨基酸组成、顺序和相似性搜索来预测分泌蛋白。
In Silico Biol. 2008;8(2):129-40.
10
Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks.使用神经网络预测蛋白质中单核和双核核苷酸特异性DNA结合位点。
BMC Struct Biol. 2009 May 13;9:30. doi: 10.1186/1472-6807-9-30.

引用本文的文献

1
MKFGO: integrating multi-source knowledge fusion with pretrained language model for high-accuracy protein function prediction.MKFGO:将多源知识融合与预训练语言模型相结合用于高精度蛋白质功能预测
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf420.
2
Machine learning application to predict binding affinity between peptide containing non-canonical amino acids and HLA-A0201.用于预测含非标准氨基酸的肽与HLA - A0201之间结合亲和力的机器学习应用
PLoS One. 2025 Jun 27;20(6):e0314833. doi: 10.1371/journal.pone.0314833. eCollection 2025.
3
ProstaNet: A Novel Geometric Vector Perceptrons-Graph Neural Network Algorithm for Protein Stability Prediction in Single- and Multiple-Point Mutations with Experimental Validation.
ProstaNet:一种用于单点和多点突变蛋白质稳定性预测的新型几何向量感知器-图神经网络算法,并经过实验验证
Research (Wash D C). 2025 Apr 15;8:0674. doi: 10.34133/research.0674. eCollection 2025.
4
A multimodal model for protein function prediction.一种用于蛋白质功能预测的多模态模型。
Sci Rep. 2025 Mar 26;15(1):10465. doi: 10.1038/s41598-025-94612-y.
5
Challenges in AI-driven Biomedical Multimodal Data Fusion and Analysis.人工智能驱动的生物医学多模态数据融合与分析中的挑战。
Genomics Proteomics Bioinformatics. 2025 May 10;23(1). doi: 10.1093/gpbjnl/qzaf011.
6
Machine learning approaches for predicting protein-ligand binding sites from sequence data.从序列数据预测蛋白质-配体结合位点的机器学习方法。
Front Bioinform. 2025 Feb 3;5:1520382. doi: 10.3389/fbinf.2025.1520382. eCollection 2025.
7
Machine learning application to predict binding affinity between peptide containing non-canonical amino acids and HLA0201.机器学习应用于预测含非标准氨基酸的肽与HLA0201之间的结合亲和力。
bioRxiv. 2024 Nov 21:2024.11.19.624425. doi: 10.1101/2024.11.19.624425.
8
Protein sequence analysis in the context of drug repurposing.药物再利用背景下的蛋白质序列分析。
BMC Med Inform Decis Mak. 2024 May 13;24(1):122. doi: 10.1186/s12911-024-02531-1.
9
A comprehensive framework for advanced protein classification and function prediction using synergistic approaches: Integrating bispectral analysis, machine learning, and deep learning.利用协同方法进行高级蛋白质分类和功能预测的综合框架:结合双谱分析、机器学习和深度学习。
PLoS One. 2023 Dec 14;18(12):e0295805. doi: 10.1371/journal.pone.0295805. eCollection 2023.
10
Comparative study of encoded and alignment-based methods for virus taxonomy classification.基于编码和比对的病毒分类学方法比较研究。
Sci Rep. 2023 Oct 31;13(1):18662. doi: 10.1038/s41598-023-45461-0.