• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于本体论先验的正无标记排序预测蛋白质功能。

Predicting protein functions using positive-unlabeled ranking with ontology-based priors.

机构信息

Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.

Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.

出版信息

Bioinformatics. 2024 Jun 28;40(Suppl 1):i401-i409. doi: 10.1093/bioinformatics/btae237.

DOI:10.1093/bioinformatics/btae237
PMID:38940168
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11211813/
Abstract

UNLABELLED

Automated protein function prediction is a crucial and widely studied problem in bioinformatics. Computationally, protein function is a multilabel classification problem where only positive samples are defined and there is a large number of unlabeled annotations. Most existing methods rely on the assumption that the unlabeled set of protein function annotations are negatives, inducing the false negative issue, where potential positive samples are trained as negatives. We introduce a novel approach named PU-GO, wherein we address function prediction as a positive-unlabeled ranking problem. We apply empirical risk minimization, i.e. we minimize the classification risk of a classifier where class priors are obtained from the Gene Ontology hierarchical structure. We show that our approach is more robust than other state-of-the-art methods on similarity-based and time-based benchmark datasets.

AVAILABILITY AND IMPLEMENTATION

Data and code are available at https://github.com/bio-ontology-research-group/PU-GO.

摘要

未标记

自动化蛋白质功能预测是生物信息学中一个至关重要且广泛研究的问题。从计算角度来看,蛋白质功能是一个多标签分类问题,只有正样本被定义,并且有大量未标记的注释。大多数现有方法依赖于一个假设,即未标记的蛋白质功能注释集是负样本,从而导致假阴性问题,即潜在的正样本被训练为负样本。我们引入了一种名为 PU-GO 的新方法,其中我们将功能预测作为一个正-未标记的排序问题来处理。我们应用经验风险最小化,即我们最小化分类器的分类风险,其中类先验从基因本体论层次结构中获得。我们表明,我们的方法在基于相似性和基于时间的基准数据集上比其他最先进的方法更稳健。

可用性和实现

数据和代码可在 https://github.com/bio-ontology-research-group/PU-GO 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eeaf/11211813/585a61cd76f8/btae237f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eeaf/11211813/53385da0d132/btae237f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eeaf/11211813/585a61cd76f8/btae237f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eeaf/11211813/53385da0d132/btae237f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eeaf/11211813/585a61cd76f8/btae237f2.jpg

相似文献

1
Predicting protein functions using positive-unlabeled ranking with ontology-based priors.基于本体论先验的正无标记排序预测蛋白质功能。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i401-i409. doi: 10.1093/bioinformatics/btae237.
2
Exploiting ontology graph for predicting sparsely annotated gene function.利用本体图预测注释稀疏的基因功能。
Bioinformatics. 2015 Jun 15;31(12):i357-64. doi: 10.1093/bioinformatics/btv260.
3
Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations.Onto2Vec:基于向量的生物实体联合表示及其基于本体论的标注。
Bioinformatics. 2018 Jul 1;34(13):i52-i60. doi: 10.1093/bioinformatics/bty259.
4
NegGOA: negative GO annotations selection using ontology structure.NegGOA:基于本体结构的负 GO 注释选择。
Bioinformatics. 2016 Oct 1;32(19):2996-3004. doi: 10.1093/bioinformatics/btw366. Epub 2016 Jun 17.
5
Incorporating functional inter-relationships into protein function prediction algorithms.将功能相互关系纳入蛋白质功能预测算法。
BMC Bioinformatics. 2009 May 12;10:142. doi: 10.1186/1471-2105-10-142.
6
DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms.DeepGOZero:基于本体论公理的序列和零样本学习改进蛋白质功能预测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i238-i245. doi: 10.1093/bioinformatics/btac256.
7
exp2GO: Improving Prediction of Functions in the Gene Ontology With Expression Data.exp2GO:利用表达数据改进基因本体中功能的预测
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):999-1008. doi: 10.1109/TCBB.2022.3167245. Epub 2023 Apr 3.
8
Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.基于蛋白质知识的 GO 注释预测的分层深度学习
Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.
9
PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool.PFP/ESG:通过基因本体可视化工具增强的自动化蛋白质功能预测服务器。
Bioinformatics. 2015 Jan 15;31(2):271-2. doi: 10.1093/bioinformatics/btu646. Epub 2014 Oct 1.
10
Mutual annotation-based prediction of protein domain functions with Domain2GO.基于互注释的蛋白质结构域功能预测与 Domain2GO。
Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.

引用本文的文献

1
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.

本文引用的文献

1
NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations.NetGO 3.0:蛋白质语言模型提高大规模功能注释
Genomics Proteomics Bioinformatics. 2023 Apr;21(2):349-358. doi: 10.1016/j.gpb.2023.04.001. Epub 2023 Apr 17.
2
Fast and accurate protein function prediction from sequence through pretrained language model and homology-based label diffusion.通过预训练语言模型和基于同源性的标签扩散,从序列快速准确地预测蛋白质功能。
Brief Bioinform. 2023 May 19;24(3). doi: 10.1093/bib/bbad117.
3
Evolutionary-scale prediction of atomic-level protein structure with a language model.
用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
4
NIAPU: network-informed adaptive positive-unlabeled learning for disease gene identification.NIAPU:用于疾病基因识别的基于网络信息的自适应阳性无标签学习。
Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btac848.
5
Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction.将无监督语言模型与三重态神经网络集成,用于蛋白质基因本体预测。
PLoS Comput Biol. 2022 Dec 22;18(12):e1010793. doi: 10.1371/journal.pcbi.1010793. eCollection 2022 Dec.
6
UniProt: the Universal Protein Knowledgebase in 2023.UniProt:2023 年的通用蛋白质知识库。
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.
7
DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms.DeepGOZero:基于本体论公理的序列和零样本学习改进蛋白质功能预测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i238-i245. doi: 10.1093/bioinformatics/btac256.
8
Positive-unlabeled learning in bioinformatics and computational biology: a brief review.生物信息学和计算生物学中的正无标记学习:简要综述。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab461.
9
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
10
TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding.TALE:基于 Transformer 的蛋白质功能注释与联合序列-标签嵌入。
Bioinformatics. 2021 Sep 29;37(18):2825-2833. doi: 10.1093/bioinformatics/btab198.