• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质分析与预测中简化氨基酸字母表的研究进展

Research progress of reduced amino acid alphabets in protein analysis and prediction.

作者信息

Liang Yuchao, Yang Siqi, Zheng Lei, Wang Hao, Zhou Jian, Huang Shenghui, Yang Lei, Zuo Yongchun

机构信息

State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China.

College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.

出版信息

Comput Struct Biotechnol J. 2022 Jul 4;20:3503-3510. doi: 10.1016/j.csbj.2022.07.001. eCollection 2022.

DOI:10.1016/j.csbj.2022.07.001
PMID:35860409
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9284397/
Abstract

Proteins are the executors of cellular physiological activities, and accurate structural and function elucidation are crucial for the refined mapping of proteins. As a feature engineering method, the reduction of amino acid composition is not only an important method for protein structure and function analysis, but also opens a broad horizon for the complex field of machine learning. Representing sequences with fewer amino acid types greatly reduces the complexity and noise of traditional feature engineering in dimension, and provides more interpretable predictive models for machine learning to capture key features. In this paper, we systematically reviewed the strategy and method studies of the reduced amino acid (RAA) alphabets, and summarized its main research in protein sequence alignment, functional classification, and prediction of structural properties, respectively. In the end, we gave a comprehensive analysis of 672 RAA alphabets from 74 reduction methods.

摘要

蛋白质是细胞生理活动的执行者,准确阐明其结构和功能对于蛋白质的精细图谱绘制至关重要。作为一种特征工程方法,氨基酸组成约简不仅是蛋白质结构和功能分析的重要方法,也为复杂的机器学习领域开辟了广阔前景。用更少的氨基酸类型来表示序列,极大地降低了传统特征工程在维度上的复杂性和噪声,并为机器学习提供了更具可解释性的预测模型,以捕捉关键特征。本文系统综述了约简氨基酸(RAA)字母表的策略和方法研究,并分别总结了其在蛋白质序列比对、功能分类和结构性质预测方面的主要研究。最后,我们对来自74种约简方法的672个RAA字母表进行了全面分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c19/9284397/2b5f57eb17be/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c19/9284397/62ef59a26e62/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c19/9284397/3536e7752045/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c19/9284397/2b5f57eb17be/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c19/9284397/62ef59a26e62/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c19/9284397/3536e7752045/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c19/9284397/2b5f57eb17be/gr3.jpg

相似文献

1
Research progress of reduced amino acid alphabets in protein analysis and prediction.蛋白质分析与预测中简化氨基酸字母表的研究进展
Comput Struct Biotechnol J. 2022 Jul 4;20:3503-3510. doi: 10.1016/j.csbj.2022.07.001. eCollection 2022.
2
Automated alphabet reduction for protein datasets.蛋白质数据集的自动字母缩减
BMC Bioinformatics. 2009 Jan 6;10:6. doi: 10.1186/1471-2105-10-6.
3
RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule.RAACBook:一个基于简化氨基酸字母表的网络服务器,用于通过使用周保罗的五步法则进行序列相关推断。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz131.
4
Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets.使用简化氨基酸字母表进行序列比对和折叠评估的准确性。
Proteins. 2006 Jun 1;63(4):986-95. doi: 10.1002/prot.20881.
5
Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids.通过氨基酸简化字母表对氨基酸进行分组并识别蛋白质结构保守区域。
Sci China C Life Sci. 2007 Jun;50(3):392-402. doi: 10.1007/s11427-007-0023-3.
6
Machine learning-based prediction of proteins' architecture using sequences of amino acids and structural alphabets.利用氨基酸序列和结构字母表,基于机器学习预测蛋白质结构。
J Biomol Struct Dyn. 2024 Mar 20:1-16. doi: 10.1080/07391102.2024.2328736.
7
cnnAlpha: Protein disordered regions prediction by reduced amino acid alphabets and convolutional neural networks.cnnAlpha:通过简化氨基酸字母表和卷积神经网络进行蛋白质无序区域预测
Proteins. 2020 Nov;88(11):1472-1481. doi: 10.1002/prot.25966. Epub 2020 Aug 7.
8
Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment.简化氨基酸字母表在折叠分配中表现出更高的灵敏度和选择性。
Bioinformatics. 2009 Jun 1;25(11):1356-62. doi: 10.1093/bioinformatics/btp164. Epub 2009 Apr 7.
9
Reduction of protein sequence complexity by residue grouping.通过残基分组降低蛋白质序列复杂性。
Protein Eng. 2003 May;16(5):323-30. doi: 10.1093/protein/gzg044.
10
iSP-RAAC: Identify Secretory Proteins of Malaria Parasite Using Reduced Amino Acid Composition.iSP-RAAC:利用简化的氨基酸组成来鉴定疟原虫的分泌蛋白。
Comb Chem High Throughput Screen. 2020;23(6):536-545. doi: 10.2174/1386207323666200402084518.

引用本文的文献

1
A transcriptomic and proteomic analysis and comparison of human brain tissue from patients with and without epilepsy.对有癫痫和无癫痫患者的人脑组织进行转录组学和蛋白质组学分析及比较。
Sci Rep. 2025 May 11;15(1):16369. doi: 10.1038/s41598-025-00986-4.
2
Conotoxins: Classification, Prediction, and Future Directions in Bioinformatics.芋螺毒素:生物信息学中的分类、预测及未来方向
Toxins (Basel). 2025 Feb 9;17(2):78. doi: 10.3390/toxins17020078.
3
Exploring Potential Diagnostic Biomarkers for Mechanical Asphyxia in the Heart Based on Proteomics Technology.

本文引用的文献

1
RaacFold: a webserver for 3D visualization and analysis of protein structure by using reduced amino acid alphabets.RaacFold:一个通过使用简化氨基酸字母表来进行蛋白质结构的 3D 可视化和分析的网络服务器。
Nucleic Acids Res. 2022 Jul 5;50(W1):W633-W638. doi: 10.1093/nar/gkac415.
2
DFpin: Deep learning-based protein-binding site prediction with feature-based non-redundancy from RNA level.DFpin:基于深度学习的蛋白质结合位点预测,从RNA水平进行基于特征的非冗余分析
Comput Biol Med. 2022 Mar;142:105216. doi: 10.1016/j.compbiomed.2022.105216. Epub 2022 Jan 7.
3
Identification of Disease-Related 2-Oxoglutarate/Fe (II)-Dependent Oxygenase Based on Reduced Amino Acid Cluster Strategy.
基于蛋白质组学技术探索心脏机械性窒息的潜在诊断生物标志物
Int J Mol Sci. 2024 Nov 26;25(23):12710. doi: 10.3390/ijms252312710.
4
Lambda3: homology search for protein, nucleotide, and bisulfite-converted sequences.Lambda3:蛋白质、核苷酸和亚硫酸氢盐转化序列的同源性搜索。
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae097.
5
Protein language models meet reduced amino acid alphabets.蛋白质语言模型与简化的氨基酸字母表相遇。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae061.
6
Snekmer: a scalable pipeline for protein sequence fingerprinting based on amino acid recoding.Snekmer:一种基于氨基酸重新编码的用于蛋白质序列指纹识别的可扩展流程。
Bioinform Adv. 2023 Feb 2;3(1):vbad005. doi: 10.1093/bioadv/vbad005. eCollection 2023.
基于还原氨基酸簇策略鉴定疾病相关的2-氧代戊二酸/铁(II)依赖性加氧酶
Front Cell Dev Biol. 2021 Jul 16;9:707938. doi: 10.3389/fcell.2021.707938. eCollection 2021.
4
Fast and sensitive taxonomic assignment to metagenomic contigs.快速而敏感的宏基因组序列分类学分配。
Bioinformatics. 2021 Sep 29;37(18):3029-3031. doi: 10.1093/bioinformatics/btab184.
5
IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy.IHEC\_RAAC:一种通过简化氨基酸簇策略来鉴定人类酶类的在线平台。
Amino Acids. 2021 Feb;53(2):239-251. doi: 10.1007/s00726-021-02941-9. Epub 2021 Jan 23.
6
Deep Learning in Proteomics.蛋白质组学中的深度学习。
Proteomics. 2020 Nov;20(21-22):e1900335. doi: 10.1002/pmic.201900335. Epub 2020 Oct 30.
7
cnnAlpha: Protein disordered regions prediction by reduced amino acid alphabets and convolutional neural networks.cnnAlpha:通过简化氨基酸字母表和卷积神经网络进行蛋白质无序区域预测
Proteins. 2020 Nov;88(11):1472-1481. doi: 10.1002/prot.25966. Epub 2020 Aug 7.
8
RaacLogo: a new sequence logo generator by using reduced amino acid clusters.RaacLogo:一种使用简化氨基酸簇的新型序列标志生成器。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa096.
9
A new feature selection algorithm based on relevance, redundancy and complementarity.一种基于相关性、冗余性和互补性的新特征选择算法。
Comput Biol Med. 2020 Apr;119:103667. doi: 10.1016/j.compbiomed.2020.103667. Epub 2020 Feb 19.
10
Protein Contact Map Prediction Based on ResNet and DenseNet.基于 ResNet 和 DenseNet 的蛋白质接触图预测。
Biomed Res Int. 2020 Apr 6;2020:7584968. doi: 10.1155/2020/7584968. eCollection 2020.