Pugalenthi Ganesan, Tang Ke, Suganthan P N, Archunan G, Sowdhamini R
School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore.
BMC Bioinformatics. 2007 Sep 19;8:351. doi: 10.1186/1471-2105-8-351.
Odorant binding proteins (OBPs) are believed to shuttle odorants from the environment to the underlying odorant receptors, for which they could potentially serve as odorant presenters. Although several sequence based search methods have been exploited for protein family prediction, less effort has been devoted to the prediction of OBPs from sequence data and this area is more challenging due to poor sequence identity between these proteins.
In this paper, we propose a new algorithm that uses Regularized Least Squares Classifier (RLSC) in conjunction with multiple physicochemical properties of amino acids to predict odorant-binding proteins. The algorithm was applied to the dataset derived from Pfam and GenDiS database and we obtained overall prediction accuracy of 97.7% (94.5% and 98.4% for positive and negative classes respectively).
Our study suggests that RLSC is potentially useful for predicting the odorant binding proteins from sequence-derived properties irrespective of sequence similarity. Our method predicts 92.8% of 56 odorant binding proteins non-homologous to any protein in the swissprot database and 97.1% of the 414 independent dataset proteins, suggesting the usefulness of RLSC method for facilitating the prediction of odorant binding proteins from sequence information.
气味结合蛋白(OBPs)被认为可将环境中的气味分子转运至其下方的气味受体,它们可能作为气味分子呈现者发挥作用。尽管已采用多种基于序列的搜索方法进行蛋白质家族预测,但从序列数据预测OBPs的工作做得较少,而且由于这些蛋白质之间的序列同一性较差,该领域更具挑战性。
在本文中,我们提出了一种新算法,该算法结合氨基酸的多种物理化学性质,使用正则化最小二乘分类器(RLSC)来预测气味结合蛋白。该算法应用于源自Pfam和GenDiS数据库的数据集,我们获得的总体预测准确率为97.7%(阳性和阴性类别分别为94.5%和98.4%)。
我们的研究表明,无论序列相似性如何,RLSC对于从序列衍生特性预测气味结合蛋白可能是有用的。我们的方法预测了56种与swissprot数据库中任何蛋白质均无同源性的气味结合蛋白中的92.8%,以及414个独立数据集蛋白质中的97.1%,这表明RLSC方法对于从序列信息促进气味结合蛋白的预测是有用的。