基于本体论先验的正无标记排序预测蛋白质功能。

Predicting protein functions using positive-unlabeled ranking with ontology-based priors.

机构信息

Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.

Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.

出版信息

Bioinformatics. 2024 Jun 28;40(Suppl 1):i401-i409. doi: 10.1093/bioinformatics/btae237.

DOI:10.1093/bioinformatics/btae237

PMID:38940168

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11211813/

Abstract

UNLABELLED

Automated protein function prediction is a crucial and widely studied problem in bioinformatics. Computationally, protein function is a multilabel classification problem where only positive samples are defined and there is a large number of unlabeled annotations. Most existing methods rely on the assumption that the unlabeled set of protein function annotations are negatives, inducing the false negative issue, where potential positive samples are trained as negatives. We introduce a novel approach named PU-GO, wherein we address function prediction as a positive-unlabeled ranking problem. We apply empirical risk minimization, i.e. we minimize the classification risk of a classifier where class priors are obtained from the Gene Ontology hierarchical structure. We show that our approach is more robust than other state-of-the-art methods on similarity-based and time-based benchmark datasets.

AVAILABILITY AND IMPLEMENTATION

Data and code are available at https://github.com/bio-ontology-research-group/PU-GO.

摘要

未标记

自动化蛋白质功能预测是生物信息学中一个至关重要且广泛研究的问题。从计算角度来看，蛋白质功能是一个多标签分类问题，只有正样本被定义，并且有大量未标记的注释。大多数现有方法依赖于一个假设，即未标记的蛋白质功能注释集是负样本，从而导致假阴性问题，即潜在的正样本被训练为负样本。我们引入了一种名为 PU-GO 的新方法，其中我们将功能预测作为一个正-未标记的排序问题来处理。我们应用经验风险最小化，即我们最小化分类器的分类风险，其中类先验从基因本体论层次结构中获得。我们表明，我们的方法在基于相似性和基于时间的基准数据集上比其他最先进的方法更稳健。

可用性和实现

数据和代码可在 https://github.com/bio-ontology-research-group/PU-GO 上获得。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于本体论先验的正无标记排序预测蛋白质功能。

Predicting protein functions using positive-unlabeled ranking with ontology-based priors.

机构信息

出版信息

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

未标记

可用性和实现

相似文献

引用本文的文献

本文引用的文献

基于本体论先验的正无标记排序预测蛋白质功能。

Predicting protein functions using positive-unlabeled ranking with ontology-based priors.

机构信息

出版信息

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

未标记

可用性和实现

相似文献

引用本文的文献

本文引用的文献