• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EZpred:利用未标记的序列同源物改进基于深度学习的酶功能预测

EZpred: improving deep learning-based enzyme function prediction using unlabeled sequence homologs.

作者信息

Zhang Chengxin, Liu Quancheng, Freddolino Lydia

机构信息

CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.

Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48197, USA.

出版信息

bioRxiv. 2025 Jul 14:2025.07.09.663945. doi: 10.1101/2025.07.09.663945.

DOI:10.1101/2025.07.09.663945
PMID:40791336
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12338500/
Abstract

Features extracted from sequence homologs significantly enhance the accuracy of deep learning-based protein structure prediction. Indeed, models such as AlphaFold, which extracts features from sequence homologs, generally produce more accurate protein structures compared to single sequence-based methods like ESMfold. In contrast, features from sequence homologs are seldom employed for deep learning-based protein function prediction. Although a small number of models also incorporate function labels from sequence homologs, they cannot utilize features extracted from sequence homologs that lack function labels. To address this gap, we propose EZpred, which is the first deep learning model to use unlabeled sequence homologs for protein function prediction. Starting with the target sequence and homologs identified by MMseqs2, EZpred extracts sequence features using the ESMC protein language model. These features are then fed into a deep learning model to predict the Enzyme Commission (EC) numbers of the target protein. For 753 enzymes, the F1-score of EZpred EC number prediction is 4% higher than a similar model that does not use sequence homologs and at least 10% higher that state-of-the-art EC number prediction models. These results demonstrate the strong positive impact of sequence homologs in deep learning-based enzyme function prediction.

摘要

从序列同源物中提取的特征显著提高了基于深度学习的蛋白质结构预测的准确性。事实上,像AlphaFold这样从序列同源物中提取特征的模型,与像ESMfold这样基于单序列的方法相比,通常能产生更准确的蛋白质结构。相比之下,序列同源物的特征很少用于基于深度学习的蛋白质功能预测。虽然少数模型也纳入了来自序列同源物的功能标签,但它们无法利用从缺乏功能标签的序列同源物中提取的特征。为了弥补这一差距,我们提出了EZpred,这是第一个使用未标记序列同源物进行蛋白质功能预测的深度学习模型。从通过MMseqs2识别的目标序列和同源物开始,EZpred使用ESMC蛋白质语言模型提取序列特征。然后将这些特征输入到深度学习模型中,以预测目标蛋白质的酶委员会(EC)编号。对于753种酶,EZpred的EC编号预测的F1分数比不使用序列同源物的类似模型高4%,比最先进的EC编号预测模型至少高10%。这些结果证明了序列同源物在基于深度学习的酶功能预测中的强大积极影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6dd/12338500/c0745cdf33e4/nihpp-2025.07.09.663945v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6dd/12338500/ec1b6d35edc9/nihpp-2025.07.09.663945v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6dd/12338500/6f50b8627388/nihpp-2025.07.09.663945v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6dd/12338500/c6f6393bcd32/nihpp-2025.07.09.663945v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6dd/12338500/c0745cdf33e4/nihpp-2025.07.09.663945v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6dd/12338500/ec1b6d35edc9/nihpp-2025.07.09.663945v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6dd/12338500/6f50b8627388/nihpp-2025.07.09.663945v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6dd/12338500/c6f6393bcd32/nihpp-2025.07.09.663945v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d6dd/12338500/c0745cdf33e4/nihpp-2025.07.09.663945v1-f0004.jpg

相似文献

1
EZpred: improving deep learning-based enzyme function prediction using unlabeled sequence homologs.EZpred:利用未标记的序列同源物改进基于深度学习的酶功能预测
bioRxiv. 2025 Jul 14:2025.07.09.663945. doi: 10.1101/2025.07.09.663945.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
4
Short-Term Memory Impairment短期记忆障碍
5
Sexual Harassment and Prevention Training性骚扰与预防培训
6
Anterior Approach Total Ankle Arthroplasty with Patient-Specific Cut Guides.使用患者特异性截骨导向器的前路全踝关节置换术。
JBJS Essent Surg Tech. 2025 Aug 15;15(3). doi: 10.2106/JBJS.ST.23.00027. eCollection 2025 Jul-Sep.
7
Electronic cigarettes for smoking cessation and reduction.用于戒烟和减少吸烟量的电子烟。
Cochrane Database Syst Rev. 2014(12):CD010216. doi: 10.1002/14651858.CD010216.pub2. Epub 2014 Dec 17.
8
The agreement of phonetic transcriptions between paediatric speech and language therapists transcribing a disordered speech sample.儿科言语和语言治疗师转写语音样本的音标转录的一致性。
Int J Lang Commun Disord. 2024 Sep-Oct;59(5):1981-1995. doi: 10.1111/1460-6984.13043. Epub 2024 Jun 8.
9
Systemic Inflammatory Response Syndrome全身炎症反应综合征
10
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

本文引用的文献

1
Simulating 500 million years of evolution with a language model.用语言模型模拟5亿年的进化历程。
Science. 2025 Feb 21;387(6736):850-858. doi: 10.1126/science.ads0018. Epub 2025 Jan 16.
2
Improved enzyme functional annotation prediction using contrastive learning with structural inference.使用带有结构推理的对比学习改进酶功能注释预测。
Commun Biol. 2024 Dec 23;7(1):1690. doi: 10.1038/s42003-024-07359-z.
3
InterLabelGO+: unraveling label correlations in protein function prediction.InterLabelGO+:揭示蛋白质功能预测中标签相关性。
Bioinformatics. 2024 Nov 1;40(11). doi: 10.1093/bioinformatics/btae655.
4
Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures.通过在 ESMFold 预测结构上进行几何图形学习,准确预测酶功能。
Nat Commun. 2024 Sep 18;15(1):8180. doi: 10.1038/s41467-024-52533-w.
5
A large-scale assessment of sequence database search tools for homology-based protein function prediction.基于序列数据库搜索工具的大规模评估用于同源蛋白功能预测。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae349.
6
Accurate structure prediction of biomolecular interactions with AlphaFold 3.利用 AlphaFold 3 进行生物分子相互作用的精确结构预测。
Nature. 2024 Jun;630(8016):493-500. doi: 10.1038/s41586-024-07487-w. Epub 2024 May 8.
7
AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding.AnnoPRO:一种基于多尺度蛋白质表示和双通道编码混合深度学习的蛋白质功能注释策略。
Genome Biol. 2024 Feb 1;25(1):41. doi: 10.1186/s13059-024-03166-1.
8
Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data.利用 DeepMSA2 和海量宏基因组学数据改进深度学习蛋白质单体和复合物结构预测。
Nat Methods. 2024 Feb;21(2):279-289. doi: 10.1038/s41592-023-02130-4. Epub 2024 Jan 2.
9
Evidential deep learning for trustworthy prediction of enzyme commission number.基于证据的深度学习方法可实现酶委员会编号的可靠预测。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad401.
10
Functional annotation of enzyme-encoding genes using deep learning with transformer layers.利用带有转换器层的深度学习对酶编码基因进行功能注释。
Nat Commun. 2023 Nov 14;14(1):7370. doi: 10.1038/s41467-023-43216-z.