Suppr超能文献

通过挖掘功能关联、序列以及蛋白质-蛋白质和基因-基因相互作用网络进行综合蛋白质功能预测。

Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks.

作者信息

Cao Renzhi, Cheng Jianlin

机构信息

Computer Science Department, Informatics Institute, University of Missouri, Columbia, MO 65211, USA.

Computer Science Department, Informatics Institute, University of Missouri, Columbia, MO 65211, USA.

出版信息

Methods. 2016 Jan 15;93:84-91. doi: 10.1016/j.ymeth.2015.09.011. Epub 2015 Sep 11.

Abstract

MOTIVATIONS

Protein function prediction is an important and challenging problem in bioinformatics and computational biology. Functionally relevant biological information such as protein sequences, gene expression, and protein-protein interactions has been used mostly separately for protein function prediction. One of the major challenges is how to effectively integrate multiple sources of both traditional and new information such as spatial gene-gene interaction networks generated from chromosomal conformation data together to improve protein function prediction.

RESULTS

In this work, we developed three different probabilistic scores (MIS, SEQ, and NET score) to combine protein sequence, function associations, and protein-protein interaction and spatial gene-gene interaction networks for protein function prediction. The MIS score is mainly generated from homologous proteins found by PSI-BLAST search, and also association rules between Gene Ontology terms, which are learned by mining the Swiss-Prot database. The SEQ score is generated from protein sequences. The NET score is generated from protein-protein interaction and spatial gene-gene interaction networks. These three scores were combined in a new Statistical Multiple Integrative Scoring System (SMISS) to predict protein function. We tested SMISS on the data set of 2011 Critical Assessment of Function Annotation (CAFA). The method performed substantially better than three base-line methods and an advanced method based on protein profile-sequence comparison, profile-profile comparison, and domain co-occurrence networks according to the maximum F-measure.

摘要

动机

蛋白质功能预测是生物信息学和计算生物学中一个重要且具有挑战性的问题。功能相关的生物信息,如蛋白质序列、基因表达和蛋白质 - 蛋白质相互作用,大多被分别用于蛋白质功能预测。其中一个主要挑战是如何有效地整合多种传统和新信息源,如从染色体构象数据生成的空间基因 - 基因相互作用网络,以改进蛋白质功能预测。

结果

在这项工作中,我们开发了三种不同的概率得分(MIS、SEQ和NET得分),用于结合蛋白质序列、功能关联以及蛋白质 - 蛋白质相互作用和空间基因 - 基因相互作用网络来进行蛋白质功能预测。MIS得分主要由PSI - BLAST搜索找到的同源蛋白质生成,还包括通过挖掘Swiss - Prot数据库学习到的基因本体术语之间的关联规则。SEQ得分由蛋白质序列生成。NET得分由蛋白质 - 蛋白质相互作用和空间基因 - 基因相互作用网络生成。这三个得分被整合到一个新的统计多重综合评分系统(SMISS)中以预测蛋白质功能。我们在2011年功能注释关键评估(CAFA)数据集上测试了SMISS。根据最大F值度量,该方法的表现明显优于三种基线方法以及一种基于蛋白质谱 - 序列比较、谱 - 谱比较和结构域共现网络的先进方法。

相似文献

2
Three-level prediction of protein function by combining profile-sequence search, profile-profile search, and domain co-occurrence networks.
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-14-S3-S3. Epub 2013 Feb 28.
3
Protein function prediction using guilty by association from interaction networks.
Amino Acids. 2015 Dec;47(12):2583-92. doi: 10.1007/s00726-015-2049-3. Epub 2015 Jul 28.
4
Imbalance Data Processing Strategy for Protein Interaction Sites Prediction.
IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):985-994. doi: 10.1109/TCBB.2019.2953908. Epub 2021 Jun 3.
5
An integrated approach to the prediction of domain-domain interactions.
BMC Bioinformatics. 2006 May 25;7:269. doi: 10.1186/1471-2105-7-269.
7
Protein function prediction using text-based features extracted from the biomedical literature: the CAFA challenge.
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S14. doi: 10.1186/1471-2105-14-S3-S14. Epub 2013 Feb 28.
10
D-SLIMMER: domain-SLiM interaction motifs miner for sequence based protein-protein interaction data.
J Proteome Res. 2011 Dec 2;10(12):5285-95. doi: 10.1021/pr200312e. Epub 2011 Nov 1.

引用本文的文献

2
Improving protein function prediction by learning and integrating representations of protein sequences and function labels.
Bioinform Adv. 2024 Aug 17;4(1):vbae120. doi: 10.1093/bioadv/vbae120. eCollection 2024.
3
PASS: Protein Annotation Surveillance Site for Protein Annotation Using Homologous Clusters, NLP, and Sequence Similarity Networks.
Front Bioinform. 2021 Sep 29;1:749008. doi: 10.3389/fbinf.2021.749008. eCollection 2021.
4
A tensor-based bi-random walks model for protein function prediction.
BMC Bioinformatics. 2022 May 30;23(1):199. doi: 10.1186/s12859-022-04747-2.
6
annotation of unreviewed acetylcholinesterase (AChE) in some lepidopteran insect pest species reveals the causes of insecticide resistance.
Saudi J Biol Sci. 2021 Apr;28(4):2197-2209. doi: 10.1016/j.sjbs.2021.01.007. Epub 2021 Jan 21.
7
A computational framework for identifying the transcription factors involved in enhancer-promoter loop formation.
Mol Ther Nucleic Acids. 2020 Nov 17;23:347-354. doi: 10.1016/j.omtn.2020.11.011. eCollection 2021 Mar 5.
8
Automatic Gene Function Prediction in the 2020's.
Genes (Basel). 2020 Oct 27;11(11):1264. doi: 10.3390/genes11111264.
9
DNSS2: Improved ab initio protein secondary structure prediction using advanced deep learning architectures.
Proteins. 2021 Feb;89(2):207-217. doi: 10.1002/prot.26007. Epub 2020 Sep 16.

本文引用的文献

1
Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.
J Biomed Semantics. 2015 Mar 18;6:9. doi: 10.1186/s13326-015-0006-4. eCollection 2015.
2
PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment.
Bioinformatics. 2015 May 15;31(10):1544-52. doi: 10.1093/bioinformatics/btu851. Epub 2015 Jan 8.
3
PFP/ESG: automated protein function prediction servers enhanced with Gene Ontology visualization tool.
Bioinformatics. 2015 Jan 15;31(2):271-2. doi: 10.1093/bioinformatics/btu646. Epub 2014 Oct 1.
4
Activities at the Universal Protein Resource (UniProt).
Nucleic Acids Res. 2014 Jan;42(Database issue):D191-8. doi: 10.1093/nar/gkt1140. Epub 2013 Nov 18.
6
MS-kNN: protein function prediction by integrating multiple data sources.
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S8. doi: 10.1186/1471-2105-14-S3-S8. Epub 2013 Feb 28.
7
Three-level prediction of protein function by combining profile-sequence search, profile-profile search, and domain co-occurrence networks.
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-14-S3-S3. Epub 2013 Feb 28.
8
In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment.
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-14-S3-S2. Epub 2013 Feb 28.
9
Protein function prediction by massive integration of evolutionary analyses and multiple data sources.
BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S1. doi: 10.1186/1471-2105-14-S3-S1. Epub 2013 Feb 28.
10
A large-scale evaluation of computational protein function prediction.
Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验