使用标记和未标记数据进行基因功能预测。

Gene function prediction using labeled and unlabeled data.

作者信息

Zhao Xing-Ming, Wang Yong, Chen Luonan, Aihara Kazuyuki

机构信息

ERATO Aihara Complexity Modelling Project, JST, 4-6-1 Komaba, Meguro, Tokyo, Japan.

出版信息

BMC Bioinformatics. 2008 Jan 28;9:57. doi: 10.1186/1471-2105-9-57.

DOI:10.1186/1471-2105-9-57

PMID:18221567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2275242/

Abstract

BACKGROUND

In general, gene function prediction can be formalized as a classification problem based on machine learning technique. Usually, both labeled positive and negative samples are needed to train the classifier. For the problem of gene function prediction, however, the available information is only about positive samples. In other words, we know which genes have the function of interested, while it is generally unclear which genes do not have the function, i.e. the negative samples. If all the genes outside of the target functional family are seen as negative samples, the imbalanced problem will arise because there are only a relatively small number of genes annotated in each family. Furthermore, the classifier may be degraded by the false negatives in the heuristically generated negative samples.

RESULTS

In this paper, we present a new technique, namely Annotating Genes with Positive Samples (AGPS), for defining negative samples in gene function prediction. With the defined negative samples, it is straightforward to predict the functions of unknown genes. In addition, the AGPS algorithm is able to integrate various kinds of data sources to predict gene functions in a reliable and accurate manner. With the one-class and two-class Support Vector Machines as the core learning algorithm, the AGPS algorithm shows good performances for function prediction on yeast genes.

CONCLUSION

We proposed a new method for defining negative samples in gene function prediction. Experimental results on yeast genes show that AGPS yields good performances on both training and test sets. In addition, the overlapping between prediction results and GO annotations on unknown genes also demonstrates the effectiveness of the proposed method.

摘要

背景

一般来说，基因功能预测可以形式化为基于机器学习技术的分类问题。通常，训练分类器需要有标记的正样本和负样本。然而，对于基因功能预测问题，可用信息仅关于正样本。换句话说，我们知道哪些基因具有感兴趣的功能，而通常不清楚哪些基因不具有该功能，即负样本。如果将目标功能家族之外的所有基因都视为负样本，就会出现不平衡问题，因为每个家族中注释的基因数量相对较少。此外，分类器可能会因启发式生成的负样本中的假阴性而性能下降。

结果

在本文中，我们提出了一种新技术，即利用正样本注释基因（AGPS），用于在基因功能预测中定义负样本。利用定义好的负样本，预测未知基因的功能就变得很直接。此外，AGPS算法能够整合各种数据源，以可靠且准确的方式预测基因功能。以一类和二类支持向量机作为核心学习算法，AGPS算法在酵母基因的功能预测方面表现良好。

结论

我们提出了一种在基因功能预测中定义负样本的新方法。酵母基因的实验结果表明，AGPS在训练集和测试集上均表现良好。此外，预测结果与未知基因的GO注释之间的重叠也证明了所提方法的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/135a/2275242/f7db2a72e0fc/1471-2105-9-57-1.jpg

相似文献

Gene function prediction using labeled and unlabeled data.

BMC Bioinformatics. 2008 Jan 28;9:57. doi: 10.1186/1471-2105-9-57.

Gene function prediction by a combined analysis of gene expression data and protein-protein interaction data.

J Bioinform Comput Biol. 2005 Dec;3(6):1371-89. doi: 10.1142/s0219720005001612.

A weighted power framework for integrating multisource information: gene function prediction in yeast.

IEEE Trans Biomed Eng. 2012 Apr;59(4):1162-8. doi: 10.1109/TBME.2012.2186689. Epub 2012 Feb 3.

CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data.

Bioinformatics. 2007 Jan 15;23(2):215-21. doi: 10.1093/bioinformatics/btl569. Epub 2006 Nov 10.

Searching for functional gene modules with interaction component models.

BMC Syst Biol. 2010 Jan 25;4:4. doi: 10.1186/1752-0509-4-4.

PathFinder: mining signal transduction pathway segments from protein-protein interaction networks.

BMC Bioinformatics. 2007 Sep 13;8:335. doi: 10.1186/1471-2105-8-335.

An algorithm for finding functional modules and protein complexes in protein-protein interaction networks.

J Biomed Biotechnol. 2008;2008:860270. doi: 10.1155/2008/860270.

Identifying protein complexes based on the integration of PPI network and gene expression data.

Int J Bioinform Res Appl. 2015;11(1):30-44. doi: 10.1504/IJBRA.2015.067337.

Dynamic algorithm for inferring qualitative models of gene regulatory networks.

Proc IEEE Comput Syst Bioinform Conf. 2004:353-62. doi: 10.1109/csb.2004.1332448.

Detecting functional modules in the yeast protein-protein interaction network.

Bioinformatics. 2006 Sep 15;22(18):2283-90. doi: 10.1093/bioinformatics/btl370. Epub 2006 Jul 12.

引用本文的文献

Improved in Silico Identification of Protein-Protein Interactions Using Deep Learning Approach.

IET Syst Biol. 2025 Jan-Dec;19(1):e70008. doi: 10.1049/syb2.70008.

Predicting protein functions using positive-unlabeled ranking with ontology-based priors.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i401-i409. doi: 10.1093/bioinformatics/btae237.

Learning peptide properties with positive examples only.

Digit Discov. 2024 Apr 19;3(5):977-986. doi: 10.1039/d3dd00218g. eCollection 2024 May 15.

Computational Methods for Prediction of Human Protein-Phenotype Associations: A Review.

Phenomics. 2021 Aug 6;1(4):171-185. doi: 10.1007/s43657-021-00019-w. eCollection 2021 Aug.

Gene Mining and Flavour Metabolism Analyses of Y-1 Isolated From a Chinese Liquor Fermentation Starter.

Front Microbiol. 2022 May 2;13:891387. doi: 10.3389/fmicb.2022.891387. eCollection 2022.

Gene function prediction based on combining gene ontology hierarchy with multi-instance multi-label learning.

RSC Adv. 2018 Aug 10;8(50):28503-28509. doi: 10.1039/c8ra05122d. eCollection 2018 Aug 7.

DeepCAPE: A Deep Convolutional Neural Network for the Accurate Prediction of Enhancers.

Genomics Proteomics Bioinformatics. 2021 Aug;19(4):565-577. doi: 10.1016/j.gpb.2019.04.006. Epub 2021 Feb 11.

Computational Identification of Lysine Glutarylation Sites Using Positive-Unlabeled Learning.

Curr Genomics. 2020 Apr;21(3):204-211. doi: 10.2174/1389202921666200511072327.

Using Two-dimensional Principal Component Analysis and Rotation Forest for Prediction of Protein-Protein Interactions.

Sci Rep. 2018 Aug 27;8(1):12874. doi: 10.1038/s41598-018-30694-1.

Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree.

PLoS One. 2017 Aug 8;12(8):e0181426. doi: 10.1371/journal.pone.0181426. eCollection 2017.

本文引用的文献

Protein function prediction with the shortest path in functional linkage graph and boosting.

Int J Bioinform Res Appl. 2008;4(4):375-84. doi: 10.1504/IJBRA.2008.021175.

Protein domain annotation with integration of heterogeneous information sources.

Proteins. 2008 Jul;72(1):461-73. doi: 10.1002/prot.21943.

Protein classification with imbalanced data.

Proteins. 2008 Mar;70(4):1125-32. doi: 10.1002/prot.21870.

PSoL: a positive sample only learning algorithm for finding non-coding RNA genes.

Bioinformatics. 2006 Nov 1;22(21):2590-6. doi: 10.1093/bioinformatics/btl441. Epub 2006 Aug 31.

Functional clustering of yeast proteins from the protein-protein interaction network.

BMC Bioinformatics. 2006 Jul 24;7:355. doi: 10.1186/1471-2105-7-355.

Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions.

Bioinformatics. 2006 Jul 1;22(13):1623-30. doi: 10.1093/bioinformatics/btl145. Epub 2006 Apr 21.

Hierarchical multi-label prediction of gene function.

Bioinformatics. 2006 Apr 1;22(7):830-6. doi: 10.1093/bioinformatics/btk048. Epub 2006 Jan 12.

BioGRID: a general repository for interaction datasets.

Nucleic Acids Res. 2006 Jan 1;34(Database issue):D535-9. doi: 10.1093/nar/gkj109.

Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae.

Nucleic Acids Res. 2004 Dec 7;32(21):6414-24. doi: 10.1093/nar/gkh978. Print 2004.

The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes.

Nucleic Acids Res. 2004 Oct 14;32(18):5539-45. doi: 10.1093/nar/gkh894. Print 2004.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用标记和未标记数据进行基因功能预测。

Gene function prediction using labeled and unlabeled data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献