利用多种蛋白质关联信息对兼职蛋白进行全基因组规模预测。

Genome-scale prediction of moonlighting proteins using diverse protein association information.

作者信息

Khan Ishita K, Kihara Daisuke

机构信息

Department of Computer Science.

Department of Computer Science Department of Biological Science, Purdue University, West Lafayette, IN, USA.

出版信息

Bioinformatics. 2016 Aug 1;32(15):2281-8. doi: 10.1093/bioinformatics/btw166. Epub 2016 Mar 26.

DOI:10.1093/bioinformatics/btw166

PMID:27153604

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4965633/

Abstract

MOTIVATION

Moonlighting proteins (MPs) show multiple cellular functions within a single polypeptide chain. To understand the overall landscape of their functional diversity, it is important to establish a computational method that can identify MPs on a genome scale. Previously, we have systematically characterized MPs using functional and omics-scale information. In this work, we develop a computational prediction model for automatic identification of MPs using a diverse range of protein association information.

RESULTS

We incorporated a diverse range of protein association information to extract characteristic features of MPs, which range from gene ontology (GO), protein-protein interactions, gene expression, phylogenetic profiles, genetic interactions and network-based graph properties to protein structural properties, i.e. intrinsically disordered regions in the protein chain. Then, we used machine learning classifiers using the broad feature space for predicting MPs. Because many known MPs lack some proteomic features, we developed an imputation technique to fill such missing features. Results on the control dataset show that MPs can be predicted with over 98% accuracy when GO terms are available. Furthermore, using only the omics-based features the method can still identify MPs with over 75% accuracy. Last, we applied the method on three genomes: Saccharomyces cerevisiae, Caenorhabditis elegans and Homo sapiens, and found that about 2-10% of proteins in the genomes are potential MPs.

AVAILABILITY AND IMPLEMENTATION

Code available at http://kiharalab.org/MPprediction

CONTACT

dkihara@purdue.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

兼性蛋白质（MPs）在单一多肽链中展现出多种细胞功能。为了解其功能多样性的整体格局，建立一种能够在基因组规模上识别MPs的计算方法至关重要。此前，我们已利用功能和组学规模的信息对MPs进行了系统表征。在这项工作中，我们开发了一种计算预测模型，用于利用多种蛋白质关联信息自动识别MPs。

结果

我们整合了多种蛋白质关联信息，以提取MPs的特征，这些信息范围从基因本体（GO）、蛋白质-蛋白质相互作用、基因表达、系统发育谱、遗传相互作用和基于网络的图属性到蛋白质结构属性，即蛋白质链中的内在无序区域。然后，我们使用机器学习分类器，利用广泛的特征空间来预测MPs。由于许多已知的MPs缺乏一些蛋白质组学特征，我们开发了一种插补技术来填补这些缺失的特征。对照数据集的结果表明，当有GO术语可用时，MPs的预测准确率超过98%。此外，仅使用基于组学的特征，该方法仍能以超过75%的准确率识别MPs。最后，我们将该方法应用于三个基因组：酿酒酵母、秀丽隐杆线虫和智人，发现基因组中约2-10%的蛋白质是潜在的MPs。

可用性和实现方式

代码可在http://kiharalab.org/MPprediction获取

联系方式

dkihara@purdue.edu

补充信息

补充数据可在《生物信息学》在线获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

利用多种蛋白质关联信息对兼职蛋白进行全基因组规模预测。

Genome-scale prediction of moonlighting proteins using diverse protein association information.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

利用多种蛋白质关联信息对兼职蛋白进行全基因组规模预测。

Genome-scale prediction of moonlighting proteins using diverse protein association information.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献