• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过整合序列和结构特征增强蛋白质功能同一性的预测。

Enhanced prediction of protein functional identity through the integration of sequence and structural features.

作者信息

Fujita Suguru, Terada Tohru

机构信息

Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan.

出版信息

Comput Struct Biotechnol J. 2024 Nov 14;23:4124-4130. doi: 10.1016/j.csbj.2024.11.028. eCollection 2024 Dec.

DOI:10.1016/j.csbj.2024.11.028
PMID:39624166
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11609699/
Abstract

Although over 300 million protein sequences are registered in a reference sequence database, only 0.2 % have experimentally determined functions. This suggests that many valuable proteins, potentially catalyzing novel enzymatic reactions, remain undiscovered among the vast number of function-unknown proteins. In this study, we developed a method to predict whether two proteins catalyze the same enzymatic reaction by analyzing sequence and structural similarities, utilizing structural models predicted by AlphaFold2. We performed pocket detection and domain decomposition for each structural model. The similarity between protein pairs was assessed using features such as full-length sequence similarity, domain structural similarity, and pocket similarity. We developed several models using conventional machine learning algorithms and found that the LightGBM-based model outperformed the models. Our method also surpassed existing approaches, including those based solely on full-length sequence similarity and state-of-the-art deep learning models. Feature importance analysis revealed that domain sequence identity, calculated through structural alignment, had the greatest influence on the prediction. Therefore, our findings demonstrate that integrating sequence and structural information improves the accuracy of protein function prediction.

摘要

尽管参考序列数据库中登记了超过3亿条蛋白质序列,但只有0.2%的序列具有通过实验确定的功能。这表明,在大量功能未知的蛋白质中,许多潜在催化新型酶促反应的有价值蛋白质仍未被发现。在本研究中,我们开发了一种方法,通过分析序列和结构相似性,利用AlphaFold2预测的结构模型,来预测两种蛋白质是否催化相同的酶促反应。我们对每个结构模型进行了口袋检测和结构域分解。使用全长序列相似性、结构域结构相似性和口袋相似性等特征评估蛋白质对之间的相似性。我们使用传统机器学习算法开发了几种模型,发现基于LightGBM的模型表现优于其他模型。我们的方法也超越了现有方法,包括仅基于全长序列相似性的方法和最先进的深度学习模型。特征重要性分析表明,通过结构比对计算的结构域序列同一性对预测影响最大。因此,我们的研究结果表明,整合序列和结构信息可提高蛋白质功能预测的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e33/11609699/745a2f5655e1/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e33/11609699/915cb6b33b5b/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e33/11609699/a8d4376bdc04/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e33/11609699/15aed5a8dc99/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e33/11609699/d4ba2f32dc8f/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e33/11609699/745a2f5655e1/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e33/11609699/915cb6b33b5b/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e33/11609699/a8d4376bdc04/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e33/11609699/15aed5a8dc99/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e33/11609699/d4ba2f32dc8f/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e33/11609699/745a2f5655e1/gr4.jpg

相似文献

1
Enhanced prediction of protein functional identity through the integration of sequence and structural features.通过整合序列和结构特征增强蛋白质功能同一性的预测。
Comput Struct Biotechnol J. 2024 Nov 14;23:4124-4130. doi: 10.1016/j.csbj.2024.11.028. eCollection 2024 Dec.
2
SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.SCPRED:对与预测序列具有模糊相似性的序列的蛋白质结构类别进行准确预测。
BMC Bioinformatics. 2008 May 1;9:226. doi: 10.1186/1471-2105-9-226.
3
Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences.使用功能域和预测的二级结构序列准确预测蛋白质结构类别。
J Biomol Struct Dyn. 2012;29(6):623-33. doi: 10.1080/07391102.2011.672626.
4
Identification of protein functions using a machine-learning approach based on sequence-derived properties.基于序列衍生特性,采用机器学习方法鉴定蛋白质功能。
Proteome Sci. 2009 Aug 9;7:27. doi: 10.1186/1477-5956-7-27.
5
BASE: a web service for providing compound-protein binding affinity prediction datasets with reduced similarity bias.BASE:一个提供具有降低相似性偏差的化合物-蛋白质结合亲和力预测数据集的网络服务。
BMC Bioinformatics. 2024 Oct 30;25(1):340. doi: 10.1186/s12859-024-05968-3.
6
Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.从与预测序列具有 twilight-zone 身份的序列中预测蛋白质结构类别
BMC Bioinformatics. 2009 Dec 13;10:414. doi: 10.1186/1471-2105-10-414.
7
Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.通过整合深度多序列比对、协同进化和机器学习进行蛋白质接触预测。
Proteins. 2018 Mar;86 Suppl 1(Suppl 1):84-96. doi: 10.1002/prot.25405. Epub 2017 Oct 31.
8
Detailed protein sequence alignment based on Spectral Similarity Score (SSS).基于光谱相似性评分(SSS)的详细蛋白质序列比对。
BMC Bioinformatics. 2005 Apr 23;6:105. doi: 10.1186/1471-2105-6-105.
9
Enhanced Protein Structural Class Prediction Using Effective Feature Modeling and Ensemble of Classifiers.利用有效的特征建模和分类器集成增强蛋白质结构类预测。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2409-2419. doi: 10.1109/TCBB.2020.2979430. Epub 2021 Dec 8.
10
Prediction of protein structural classes for low-homology sequences based on predicted secondary structure.基于预测的二级结构预测低同源序列的蛋白质结构类别。
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-11-S1-S9.

引用本文的文献

1
Recent Advances, Challenges, and Functional Applications of Protein Chemical Modification in the Food Industry.蛋白质化学修饰在食品工业中的最新进展、挑战及功能应用
Foods. 2025 Aug 10;14(16):2784. doi: 10.3390/foods14162784.

本文引用的文献

1
Engineering Enzymes for Environmental Sustainability.为实现环境可持续性而设计的酶
Angew Chem Weinheim Bergstr Ger. 2023 Dec 21;135(52):e202309305. doi: 10.1002/ange.202309305. Epub 2023 Oct 5.
2
Is Protein BLAST a thing of the past?蛋白质 BLAST 是否已成往事?
Nat Commun. 2023 Dec 11;14(1):8195. doi: 10.1038/s41467-023-44082-5.
3
xProtCAS: A Toolkit for Extracting Conserved Accessible Surfaces from Protein Structures.xProtCAS:从蛋白质结构中提取保守可及表面的工具包。
Biomolecules. 2023 May 30;13(6):906. doi: 10.3390/biom13060906.
4
Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
5
Sequence-structure-function relationships in the microbial protein universe.微生物蛋白质宇宙中的序列-结构-功能关系。
Nat Commun. 2023 Apr 26;14(1):2351. doi: 10.1038/s41467-023-37896-w.
6
A Comprehensive Survey of Deep Learning Techniques in Protein Function Prediction.深度学习技术在蛋白质功能预测中的综合研究
IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2291-2301. doi: 10.1109/TCBB.2023.3247634. Epub 2023 Jun 5.
7
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
8
ProteInfer, deep neural networks for protein functional inference.蛋白推断,用于蛋白质功能推断的深度神经网络。
Elife. 2023 Feb 27;12:e80942. doi: 10.7554/eLife.80942.
9
Using deep learning to annotate the protein universe.利用深度学习标注蛋白质宇宙。
Nat Biotechnol. 2022 Jun;40(6):932-937. doi: 10.1038/s41587-021-01179-w. Epub 2022 Feb 21.
10
Green Chemistry, Biocatalysis, and the Chemical Industry of the Future.绿色化学、生物催化与未来的化学工业。
ChemSusChem. 2022 May 6;15(9):e202102628. doi: 10.1002/cssc.202102628. Epub 2022 Feb 9.