Suppr超能文献

一种通过整合各种生物数据源来改进蛋白质亚细胞定位预测的方法。

A method to improve protein subcellular localization prediction by integrating various biological data sources.

作者信息

Tung Thai Quang, Lee Doheon

机构信息

Department of Bio & Brain Engineering, KAIST, Daejeon City, Republic of Korea.

出版信息

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S43. doi: 10.1186/1471-2105-10-S1-S43.

Abstract

BACKGROUND

Protein subcellular localization is crucial information to elucidate protein functions. Owing to the need for large-scale genome analysis, computational method for efficiently predicting protein subcellular localization is highly required. Although many previous works have been done for this task, the problem is still challenging due to several reasons: the number of subcellular locations in practice is large; distribution of protein in locations is imbalanced, that is the number of protein in each location remarkably different; and there are many proteins located in multiple locations. Thus it is necessary to explore new features and appropriate classification methods to improve the prediction performance.

RESULTS

In this paper we propose a new predicting method which combines two key ideas: 1) Information of neighbour proteins in a probabilistic gene network is integrated to enrich the prediction features. 2) Fuzzy k-NN, a classification method based on fuzzy set theory is applied to predict protein locating in multiple sites. Experiment was conducted on a dataset consisting of 22 locations from Budding yeast proteins and significant improvement was observed.

CONCLUSION

Our results suggest that the neighbourhood information from functional gene networks is predictive to subcellular localization. The proposed method thus can be integrated and complementary to other available prediction methods.

摘要

背景

蛋白质亚细胞定位是阐明蛋白质功能的关键信息。由于大规模基因组分析的需求,高效预测蛋白质亚细胞定位的计算方法非常必要。尽管此前已经针对此任务开展了许多工作,但由于以下几个原因,该问题仍然具有挑战性:实际中亚细胞定位的数量众多;蛋白质在各定位中的分布不均衡,即每个定位中蛋白质的数量差异显著;并且存在许多蛋白质位于多个定位中。因此,有必要探索新的特征和合适的分类方法以提高预测性能。

结果

在本文中,我们提出了一种新的预测方法,该方法结合了两个关键思想:1)整合概率基因网络中相邻蛋白质的信息以丰富预测特征。2)应用基于模糊集理论的分类方法模糊k近邻算法来预测位于多个位点的蛋白质。在一个由芽殖酵母蛋白质的22个定位组成的数据集上进行了实验,并观察到了显著的改进。

结论

我们的结果表明,来自功能基因网络的邻域信息对亚细胞定位具有预测性。因此,所提出的方法可以与其他可用的预测方法相结合并相互补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8937/2648781/6b2eb5cdfd44/1471-2105-10-S1-S43-1.jpg

相似文献

1
A method to improve protein subcellular localization prediction by integrating various biological data sources.
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S43. doi: 10.1186/1471-2105-10-S1-S43.
2
PLPD: reliable protein localization prediction from imbalanced and overlapped datasets.
Nucleic Acids Res. 2006;34(17):4655-66. doi: 10.1093/nar/gkl638. Epub 2006 Sep 11.
3
Prediction of protein subcellular locations using fuzzy k-NN method.
Bioinformatics. 2004 Jan 1;20(1):21-8. doi: 10.1093/bioinformatics/btg366.
6
Protein subcellular localization prediction of eukaryotes using a knowledge-based approach.
BMC Bioinformatics. 2009 Dec 3;10 Suppl 15(Suppl 15):S8. doi: 10.1186/1471-2105-10-S15-S8.
7
Multiple Protein Subcellular Locations Prediction Based on Deep Convolutional Neural Networks with Self-Attention Mechanism.
Interdiscip Sci. 2022 Jun;14(2):421-438. doi: 10.1007/s12539-021-00496-7. Epub 2022 Jan 23.
8
Using Nearest Feature Line and Tunable Nearest Neighbor methods for prediction of protein subcellular locations.
Comput Biol Chem. 2005 Oct;29(5):388-92. doi: 10.1016/j.compbiolchem.2005.08.002. Epub 2005 Oct 5.
9
Predicting protein localization in budding yeast.
Bioinformatics. 2005 Apr 1;21(7):944-50. doi: 10.1093/bioinformatics/bti104. Epub 2004 Oct 28.
10
Multilabel learning via random label selection for protein subcellular multilocations prediction.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):436-46. doi: 10.1109/TCBB.2013.21.

引用本文的文献

2
Multi-location gram-positive and gram-negative bacterial protein subcellular localization using gene ontology and multi-label classifier ensemble.
BMC Bioinformatics. 2015;16 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-16-S12-S1. Epub 2015 Aug 25.
3
Protein (multi-)location prediction: using location inter-dependencies in a probabilistic framework.
Algorithms Mol Biol. 2014 Mar 19;9(1):8. doi: 10.1186/1748-7188-9-8.
4
5
Multi-label multi-kernel transfer learning for human protein subcellular localization.
PLoS One. 2012;7(6):e37716. doi: 10.1371/journal.pone.0037716. Epub 2012 Jun 13.
6
Gene ontology based transfer learning for protein subcellular localization.
BMC Bioinformatics. 2011 Feb 2;12:44. doi: 10.1186/1471-2105-12-44.
7
PNAC: a protein nucleolar association classifier.
BMC Genomics. 2011 Jan 27;12:74. doi: 10.1186/1471-2164-12-74.
8
Amino acid classification based spectrum kernel fusion for protein subnuclear localization.
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-11-S1-S17.

本文引用的文献

3
WoLF PSORT: protein localization predictor.
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W585-7. doi: 10.1093/nar/gkm259. Epub 2007 May 21.
4
SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data.
Bioinformatics. 2007 Jun 1;23(11):1410-7. doi: 10.1093/bioinformatics/btm115. Epub 2007 Mar 28.
5
PLPD: reliable protein localization prediction from imbalanced and overlapped datasets.
Nucleic Acids Res. 2006;34(17):4655-66. doi: 10.1093/nar/gkl638. Epub 2006 Sep 11.
6
Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.
Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21.
7
Refining protein subcellular localization.
PLoS Comput Biol. 2005 Nov;1(6):e66. doi: 10.1371/journal.pcbi.0010066. Epub 2005 Nov 25.
8
Predicting protein localization in budding yeast.
Bioinformatics. 2005 Apr 1;21(7):944-50. doi: 10.1093/bioinformatics/bti104. Epub 2004 Oct 28.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验