Spriggs R V, Murakami Y, Nakamura H, Jones S
Department of Chemistry and Biochemistry, School of Life Sciences, John Maynard-Smith Building, University of Sussex, Falmer, Brighton, UK.
Bioinformatics. 2009 Jun 15;25(12):1492-7. doi: 10.1093/bioinformatics/btp257. Epub 2009 Apr 23.
All eukaryotic proteomes are characterized by a significant percentage of proteins of unknown function. Comp-utational function prediction methods are therefore essential as initial steps in the function annotation process. This article describes an annotation method (PiRaNhA) for the prediction of RNA-binding residues (RBRs) from protein sequence information. A series of sequence properties (position specific scoring matrices, interface propensities, predicted accessibility and hydrophobicity) are used to train a support vector machine. This method is then evaluated for its potential to be applied to RNA-binding function prediction at the level of the complete protein.
The 5-fold cross-validation of PiRaNhA on a dataset of 81 RNA-binding proteins achieves a Matthews Correlation Coefficient (MCC) of 0.50 and accuracy of 87.2%. When used to predict RBRs in 42 proteins not used in training, PiRaNhA achieves an MCC of 0.41 and accuracy of 84.5%. Decision values from the PiRaNhA predictions were used in a second SVM to make predictions of RNA-binding function at the protein level, achieving an MCC of 0.53 and accuracy of 76.1%. The PiRaNhA RBR predictions allow experimentalists to perform more targeted experiments for function annotation; and the prediction of RNA-binding function at the protein level shows promise for proteome-wide annotations.
Freely available on the web at www.bioinformatics.sussex.ac.uk/PIRANHA or http://piranha.protein.osaka-u.ac.jp.
Supplementary data are available at the Bioinformatics online.
所有真核生物蛋白质组的特征都是存在相当比例功能未知的蛋白质。因此,计算功能预测方法作为功能注释过程的初始步骤至关重要。本文描述了一种从蛋白质序列信息预测RNA结合残基(RBR)的注释方法(PiRaNhA)。一系列序列特性(位置特异性评分矩阵、界面倾向、预测的可及性和疏水性)用于训练支持向量机。然后评估该方法在完整蛋白质水平上应用于RNA结合功能预测的潜力。
PiRaNhA在81个RNA结合蛋白数据集上进行5折交叉验证,马修斯相关系数(MCC)为0.50,准确率为87.2%。当用于预测训练中未使用的42种蛋白质中的RBR时,PiRaNhA的MCC为0.41,准确率为84.5%。PiRaNhA预测的决策值用于第二个支持向量机,以在蛋白质水平上进行RNA结合功能预测,MCC为0.53,准确率为76.1%。PiRaNhA对RBR的预测使实验人员能够进行更具针对性的功能注释实验;并且在蛋白质水平上对RNA结合功能的预测显示出在全蛋白质组注释方面的前景。
可在网站www.bioinformatics.sussex.ac.uk/PIRANHA或http://piranha.protein.osaka-u.ac.jp上免费获取。
补充数据可在《生物信息学》在线获取。