• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于位点特异性氨基酸组成和理化特性的蛋白质羰基化位点的研究与鉴定

Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features.

作者信息

Weng Shun-Long, Huang Kai-Yao, Kaunang Fergie Joanda, Huang Chien-Hsun, Kao Hui-Ju, Chang Tzu-Hao, Wang Hsin-Yao, Lu Jang-Jih, Lee Tzong-Yi

机构信息

Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan.

Mackay Medicine, Nursing and Management College, Taipei, 112, Taiwan.

出版信息

BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):66. doi: 10.1186/s12859-017-1472-8.

DOI:10.1186/s12859-017-1472-8
PMID:28361707
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5374553/
Abstract

BACKGROUND

Protein carbonylation, an irreversible and non-enzymatic post-translational modification (PTM), is often used as a marker of oxidative stress. When reactive oxygen species (ROS) oxidized the amino acid side chains, carbonyl (CO) groups are produced especially on Lysine (K), Arginine (R), Threonine (T), and Proline (P). Nevertheless, due to the lack of information about the carbonylated substrate specificity, we were encouraged to develop a systematic method for a comprehensive investigation of protein carbonylation sites.

RESULTS

After the removal of redundant data from multipe carbonylation-related articles, totally 226 carbonylated proteins in human are regarded as training dataset, which consisted of 307, 126, 128, and 129 carbonylation sites for K, R, T and P residues, respectively. To identify the useful features in predicting carbonylation sites, the linear amino acid sequence was adopted not only to build up the predictive model from training dataset, but also to compare the effectiveness of prediction with other types of features including amino acid composition (AAC), amino acid pair composition (AAPC), position-specific scoring matrix (PSSM), positional weighted matrix (PWM), solvent-accessible surface area (ASA), and physicochemical properties. The investigation of position-specific amino acid composition revealed that the positively charged amino acids (K and R) are remarkably enriched surrounding the carbonylated sites, which may play a functional role in discriminating between carbonylation and non-carbonylation sites. A variety of predictive models were built using various features and three different machine learning methods. Based on the evaluation by five-fold cross-validation, the models trained with PWM feature could provide better sensitivity in the positive training dataset, while the models trained with AAindex feature achieved higher specificity in the negative training dataset. Additionally, the model trained using hybrid features, including PWM, AAC and AAindex, obtained best MCC values of 0.432, 0.472, 0.443 and 0.467 on K, R, T and P residues, respectively.

CONCLUSION

When comparing to an existing prediction tool, the selected models trained with hybrid features provided a promising accuracy on an independent testing dataset. In short, this work not only characterized the carbonylated substrate preference, but also demonstrated that the proposed method could provide a feasible means for accelerating preliminary discovery of protein carbonylation.

摘要

背景

蛋白质羰基化是一种不可逆的非酶促翻译后修饰(PTM),常被用作氧化应激的标志物。当活性氧(ROS)氧化氨基酸侧链时,尤其是在赖氨酸(K)、精氨酸(R)、苏氨酸(T)和脯氨酸(P)上会产生羰基(CO)基团。然而,由于缺乏关于羰基化底物特异性的信息,我们受到鼓舞去开发一种系统方法来全面研究蛋白质羰基化位点。

结果

从多篇与羰基化相关的文章中去除冗余数据后,总共将人类中的226种羰基化蛋白质视为训练数据集,其中分别包含307、126、128和129个K、R、T和P残基的羰基化位点。为了识别预测羰基化位点的有用特征,不仅采用线性氨基酸序列从训练数据集中构建预测模型,还将其与其他类型的特征进行预测效果比较,这些特征包括氨基酸组成(AAC)、氨基酸对组成(AAPC)、位置特异性评分矩阵(PSSM)、位置加权矩阵(PWM)、溶剂可及表面积(ASA)和物理化学性质。对位置特异性氨基酸组成的研究表明,带正电荷的氨基酸(K和R)在羰基化位点周围显著富集,这可能在区分羰基化和非羰基化位点中发挥功能作用。使用各种特征和三种不同的机器学习方法构建了多种预测模型。基于五折交叉验证的评估,用PWM特征训练的模型在阳性训练数据集中能提供更好的敏感性,而用AAindex特征训练的模型在阴性训练数据集中具有更高的特异性。此外,使用包括PWM、AAC和AAindex在内的混合特征训练的模型在K、R、T和P残基上分别获得了最佳的马修斯相关系数值0.432、0.472、0.443和0.467。

结论

与现有的预测工具相比,选择的用混合特征训练的模型在独立测试数据集上具有可观的准确性。简而言之,这项工作不仅表征了羰基化底物偏好,还证明了所提出的方法可为加速蛋白质羰基化的初步发现提供一种可行的手段。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/e8350a26b280/12859_2017_1472_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/92e239dfe63b/12859_2017_1472_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/911e9fb920a4/12859_2017_1472_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/9725b558a716/12859_2017_1472_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/66cda9ee6ee9/12859_2017_1472_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/ad29fea5d659/12859_2017_1472_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/6b827a2600f4/12859_2017_1472_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/e8350a26b280/12859_2017_1472_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/92e239dfe63b/12859_2017_1472_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/911e9fb920a4/12859_2017_1472_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/9725b558a716/12859_2017_1472_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/66cda9ee6ee9/12859_2017_1472_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/ad29fea5d659/12859_2017_1472_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/6b827a2600f4/12859_2017_1472_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dcca/5374553/e8350a26b280/12859_2017_1472_Fig7_HTML.jpg

相似文献

1
Investigation and identification of protein carbonylation sites based on position-specific amino acid composition and physicochemical features.基于位点特异性氨基酸组成和理化特性的蛋白质羰基化位点的研究与鉴定
BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):66. doi: 10.1186/s12859-017-1472-8.
2
MDD-carb: a combinatorial model for the identification of protein carbonylation sites with substrate motifs.MDD-carb:一种用于识别具有底物基序的蛋白质羰基化位点的组合模型。
BMC Syst Biol. 2017 Dec 21;11(Suppl 7):137. doi: 10.1186/s12918-017-0511-4.
3
SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites.SOHSite:整合进化信息和理化性质以识别蛋白质S-亚磺酰化位点。
BMC Genomics. 2016 Jan 11;17 Suppl 1(Suppl 1):9. doi: 10.1186/s12864-015-2299-1.
4
iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features.iDPGK:基于序列特征的赖氨酸磷酸甘油化位点的表征和鉴定。
BMC Bioinformatics. 2020 Dec 9;21(1):568. doi: 10.1186/s12859-020-03916-5.
5
UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines.UbiSite:结合具有底物基序的两层机器学习方法来预测赖氨酸上的泛素结合位点。
BMC Syst Biol. 2016 Jan 11;10 Suppl 1(Suppl 1):6. doi: 10.1186/s12918-015-0246-z.
6
Proteome-wide profiling of carbonylated proteins and carbonylation sites in HeLa cells under mild oxidative stress conditions.在温和氧化应激条件下 HeLa 细胞中羰基化蛋白质和羰基化位点的蛋白质组全面分析。
Free Radic Biol Med. 2014 Mar;68:186-95. doi: 10.1016/j.freeradbiomed.2013.11.030. Epub 2013 Dec 7.
7
CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques.CarSite-II:一种基于 K-均值相似性欠采样和合成少数类过采样技术的用于识别羰基化位点的集成分类算法。
BMC Bioinformatics. 2021 Apr 26;22(1):216. doi: 10.1186/s12859-021-04134-3.
8
MDD-Palm: Identification of protein S-palmitoylation sites with substrate motifs based on maximal dependence decomposition.MDD-Palm:基于最大依赖分解法识别具有底物基序的蛋白质S-棕榈酰化位点
PLoS One. 2017 Jun 29;12(6):e0179529. doi: 10.1371/journal.pone.0179529. eCollection 2017.
9
A computational method to predict carbonylation sites in yeast proteins.一种预测酵母蛋白质中羰基化位点的计算方法。
Genet Mol Res. 2016 Jun 20;15(2):gmr8006. doi: 10.4238/gmr.15028006.
10
CarSPred: a computational tool for predicting carbonylation sites of human proteins.CarSPred:一种预测人类蛋白质羰基化位点的计算工具。
PLoS One. 2014 Oct 27;9(10):e111478. doi: 10.1371/journal.pone.0111478. eCollection 2014.

引用本文的文献

1
Explainable Deep Multilevel Attention Learning for Predicting Protein Carbonylation Sites.用于预测蛋白质羰基化位点的可解释深度多级注意力学习
Adv Sci (Weinh). 2025 Jun;12(23):e2500581. doi: 10.1002/advs.202500581. Epub 2025 Mar 27.
2
Chemical Carbonylation of Arginine in Peptides and Proteins.肽和蛋白质中精氨酸的化学羰基化
J Am Chem Soc. 2025 Mar 26;147(12):10139-10150. doi: 10.1021/jacs.4c14476. Epub 2025 Mar 15.
3
dbAMP 3.0: updated resource of antimicrobial activity and structural annotation of peptides in the post-pandemic era.

本文引用的文献

1
SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites.SOHSite:整合进化信息和理化性质以识别蛋白质S-亚磺酰化位点。
BMC Genomics. 2016 Jan 11;17 Suppl 1(Suppl 1):9. doi: 10.1186/s12864-015-2299-1.
2
UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines.UbiSite:结合具有底物基序的两层机器学习方法来预测赖氨酸上的泛素结合位点。
BMC Syst Biol. 2016 Jan 11;10 Suppl 1(Suppl 1):6. doi: 10.1186/s12918-015-0246-z.
3
A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs.
dbAMP 3.0:大流行后时代抗菌肽活性和结构注释的更新资源。
Nucleic Acids Res. 2025 Jan 6;53(D1):D364-D376. doi: 10.1093/nar/gkae1019.
4
Skin senescence-from basic research to clinical practice.皮肤衰老——从基础研究到临床实践
Front Med (Lausanne). 2024 Oct 18;11:1484345. doi: 10.3389/fmed.2024.1484345. eCollection 2024.
5
A novel two-way rebalancing strategy for identifying carbonylation sites.一种新型双向再平衡策略,用于鉴定羰基化位点。
BMC Bioinformatics. 2023 Nov 13;24(1):429. doi: 10.1186/s12859-023-05551-2.
6
Effects of Packaging Materials on Structural and Simulated Digestive Characteristics of Walnut Protein during Accelerated Storage.包装材料对核桃蛋白在加速储存期间的结构及模拟消化特性的影响
Foods. 2023 Feb 1;12(3):620. doi: 10.3390/foods12030620.
7
In silico prediction of post-translational modifications in therapeutic antibodies.治疗性抗体中翻译后修饰的计算预测。
MAbs. 2022 Jan-Dec;14(1):2023938. doi: 10.1080/19420862.2021.2023938.
8
Protein Carbonylation: Emerging Roles in Plant Redox Biology and Future Prospects.蛋白质羰基化:在植物氧化还原生物学中的新作用及未来展望
Plants (Basel). 2021 Jul 15;10(7):1451. doi: 10.3390/plants10071451.
9
Identification of subtypes of anticancer peptides based on sequential features and physicochemical properties.基于序贯特征和理化性质鉴定抗癌肽的亚型。
Sci Rep. 2021 Jun 30;11(1):13594. doi: 10.1038/s41598-021-93124-9.
10
CarSite-II: an integrated classification algorithm for identifying carbonylated sites based on K-means similarity-based undersampling and synthetic minority oversampling techniques.CarSite-II:一种基于 K-均值相似性欠采样和合成少数类过采样技术的用于识别羰基化位点的集成分类算法。
BMC Bioinformatics. 2021 Apr 26;22(1):216. doi: 10.1186/s12859-021-04134-3.
一种用于识别具有O-连接N-乙酰葡糖胺转移酶底物基序的蛋白质O-连接N-乙酰葡糖胺化位点的两层机器学习方法。
BMC Bioinformatics. 2015;16 Suppl 18(Suppl 18):S10. doi: 10.1186/1471-2105-16-S18-S10. Epub 2015 Dec 9.
4
dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins.dbPTM 2016:蛋白质翻译后修饰资源十周年纪念
Nucleic Acids Res. 2016 Jan 4;44(D1):D435-46. doi: 10.1093/nar/gkv1240. Epub 2015 Nov 17.
5
MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs.MDD-SOH:利用最大依赖分解来识别具有底物基序的S-亚磺酰化位点。
Bioinformatics. 2016 Jan 15;32(2):165-72. doi: 10.1093/bioinformatics/btv558. Epub 2015 Sep 26.
6
GSHSite: exploiting an iteratively statistical method to identify s-glutathionylation sites with substrate specificity.GSH位点:利用迭代统计方法识别具有底物特异性的S-谷胱甘肽化位点。
PLoS One. 2015 Apr 7;10(4):e0118752. doi: 10.1371/journal.pone.0118752. eCollection 2015.
7
The RCSB Protein Data Bank: views of structural biology for basic and applied research and education.RCSB蛋白质数据库:基础与应用研究及教育的结构生物学视角。
Nucleic Acids Res. 2015 Jan;43(Database issue):D345-56. doi: 10.1093/nar/gku1214. Epub 2014 Nov 26.
8
CarSPred: a computational tool for predicting carbonylation sites of human proteins.CarSPred:一种预测人类蛋白质羰基化位点的计算工具。
PLoS One. 2014 Oct 27;9(10):e111478. doi: 10.1371/journal.pone.0111478. eCollection 2014.
9
RegPhos 2.0: an updated resource to explore protein kinase-substrate phosphorylation networks in mammals.RegPhos 2.0:一个更新的资源,用于探索哺乳动物中的蛋白激酶-底物磷酸化网络。
Database (Oxford). 2014 Apr 25;2014(0):bau034. doi: 10.1093/database/bau034. Print 2014.
10
Proteome-wide profiling of carbonylated proteins and carbonylation sites in HeLa cells under mild oxidative stress conditions.在温和氧化应激条件下 HeLa 细胞中羰基化蛋白质和羰基化位点的蛋白质组全面分析。
Free Radic Biol Med. 2014 Mar;68:186-95. doi: 10.1016/j.freeradbiomed.2013.11.030. Epub 2013 Dec 7.