Suppr超能文献

一种基于序列衍生特性识别气味结合蛋白的机器学习方法。

A machine learning approach for the identification of odorant binding proteins from sequence-derived properties.

作者信息

Pugalenthi Ganesan, Tang Ke, Suganthan P N, Archunan G, Sowdhamini R

机构信息

School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore.

出版信息

BMC Bioinformatics. 2007 Sep 19;8:351. doi: 10.1186/1471-2105-8-351.

Abstract

BACKGROUND

Odorant binding proteins (OBPs) are believed to shuttle odorants from the environment to the underlying odorant receptors, for which they could potentially serve as odorant presenters. Although several sequence based search methods have been exploited for protein family prediction, less effort has been devoted to the prediction of OBPs from sequence data and this area is more challenging due to poor sequence identity between these proteins.

RESULTS

In this paper, we propose a new algorithm that uses Regularized Least Squares Classifier (RLSC) in conjunction with multiple physicochemical properties of amino acids to predict odorant-binding proteins. The algorithm was applied to the dataset derived from Pfam and GenDiS database and we obtained overall prediction accuracy of 97.7% (94.5% and 98.4% for positive and negative classes respectively).

CONCLUSION

Our study suggests that RLSC is potentially useful for predicting the odorant binding proteins from sequence-derived properties irrespective of sequence similarity. Our method predicts 92.8% of 56 odorant binding proteins non-homologous to any protein in the swissprot database and 97.1% of the 414 independent dataset proteins, suggesting the usefulness of RLSC method for facilitating the prediction of odorant binding proteins from sequence information.

摘要

背景

气味结合蛋白(OBPs)被认为可将环境中的气味分子转运至其下方的气味受体,它们可能作为气味分子呈现者发挥作用。尽管已采用多种基于序列的搜索方法进行蛋白质家族预测,但从序列数据预测OBPs的工作做得较少,而且由于这些蛋白质之间的序列同一性较差,该领域更具挑战性。

结果

在本文中,我们提出了一种新算法,该算法结合氨基酸的多种物理化学性质,使用正则化最小二乘分类器(RLSC)来预测气味结合蛋白。该算法应用于源自Pfam和GenDiS数据库的数据集,我们获得的总体预测准确率为97.7%(阳性和阴性类别分别为94.5%和98.4%)。

结论

我们的研究表明,无论序列相似性如何,RLSC对于从序列衍生特性预测气味结合蛋白可能是有用的。我们的方法预测了56种与swissprot数据库中任何蛋白质均无同源性的气味结合蛋白中的92.8%,以及414个独立数据集蛋白质中的97.1%,这表明RLSC方法对于从序列信息促进气味结合蛋白的预测是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5bb/2216042/e289b1353527/1471-2105-8-351-1.jpg

相似文献

2
Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.
Bioinformatics. 2007 Dec 1;23(23):3147-54. doi: 10.1093/bioinformatics/btm505. Epub 2007 Oct 17.
3
High-throughput identification of interacting protein-protein binding sites.
BMC Bioinformatics. 2007 Jun 27;8:223. doi: 10.1186/1471-2105-8-223.
5
Moment invariants as shape recognition technique for comparing protein binding sites.
Bioinformatics. 2007 Dec 1;23(23):3139-46. doi: 10.1093/bioinformatics/btm503. Epub 2007 Oct 31.
6
Improved prediction of protein-protein binding sites using a support vector machines approach.
Bioinformatics. 2005 Apr 15;21(8):1487-94. doi: 10.1093/bioinformatics/bti242. Epub 2004 Dec 21.
7
Prediction of DNA-binding residues from sequence features.
J Bioinform Comput Biol. 2006 Dec;4(6):1141-58. doi: 10.1142/s0219720006002387.
8
Predicting residue-wise contact orders in proteins by support vector regression.
BMC Bioinformatics. 2006 Oct 3;7:425. doi: 10.1186/1471-2105-7-425.
9
Prediction of HLA-DRB1*0401 binding peptides using support vector machine.
Int J Data Min Bioinform. 2014;10(2):189-205. doi: 10.1504/ijdmb.2014.064015.
10
Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.
IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1394-404. doi: 10.1109/TCBB.2015.2401018.

引用本文的文献

3
Odorant Binding Proteins: a key player in the sense of smell.
Bioinformation. 2018 Jan 31;14(1):36-37. doi: 10.6026/97320630014036. eCollection 2018.
5
Prediction of lysine ubiquitylation with ensemble classifier and feature selection.
Int J Mol Sci. 2011;12(12):8347-61. doi: 10.3390/ijms12128347. Epub 2011 Nov 28.
9
Prediction of functionally important sites from protein sequences using sparse kernel least squares classifiers.
Biochem Biophys Res Commun. 2009 Jun 26;384(2):155-9. doi: 10.1016/j.bbrc.2009.04.096. Epub 2009 Apr 24.
10
Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences.
Nucleic Acids Res. 2008 May;36(9):3025-30. doi: 10.1093/nar/gkn159. Epub 2008 Apr 4.

本文引用的文献

1
Pheromone binding and inactivation by moth antennae.
Nature. 1981;293(5828):161-3. doi: 10.1038/293161a0.
2
Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides.
Biochem Biophys Res Commun. 2007 Jun 8;357(3):633-40. doi: 10.1016/j.bbrc.2007.03.162. Epub 2007 Apr 5.
4
Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites.
Biochem Biophys Res Commun. 2007 Apr 20;355(4):1006-11. doi: 10.1016/j.bbrc.2007.02.071. Epub 2007 Feb 23.
6
Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.
Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21.
7
Ensemble classifier for protein fold pattern recognition.
Bioinformatics. 2006 Jul 15;22(14):1717-22. doi: 10.1093/bioinformatics/btl170. Epub 2006 May 3.
8
Will my protein crystallize? A sequence-based predictor.
Proteins. 2006 Feb 1;62(2):343-55. doi: 10.1002/prot.20789.
9
Prediction of membrane protein types by incorporating amphipathic effects.
J Chem Inf Model. 2005 Mar-Apr;45(2):407-13. doi: 10.1021/ci049686v.
10
GenDiS: Genomic Distribution of protein structural domain Superfamilies.
Nucleic Acids Res. 2005 Jan 1;33(Database issue):D252-5. doi: 10.1093/nar/gki087.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验