一种基于序列衍生特性识别气味结合蛋白的机器学习方法。

A machine learning approach for the identification of odorant binding proteins from sequence-derived properties.

作者信息

Pugalenthi Ganesan, Tang Ke, Suganthan P N, Archunan G, Sowdhamini R

机构信息

School of Electrical and Electronic Engineering, Nanyang Technological University, 639798, Singapore.

出版信息

BMC Bioinformatics. 2007 Sep 19;8:351. doi: 10.1186/1471-2105-8-351.

DOI:10.1186/1471-2105-8-351

PMID:17880712

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2216042/

Abstract

BACKGROUND

Odorant binding proteins (OBPs) are believed to shuttle odorants from the environment to the underlying odorant receptors, for which they could potentially serve as odorant presenters. Although several sequence based search methods have been exploited for protein family prediction, less effort has been devoted to the prediction of OBPs from sequence data and this area is more challenging due to poor sequence identity between these proteins.

RESULTS

In this paper, we propose a new algorithm that uses Regularized Least Squares Classifier (RLSC) in conjunction with multiple physicochemical properties of amino acids to predict odorant-binding proteins. The algorithm was applied to the dataset derived from Pfam and GenDiS database and we obtained overall prediction accuracy of 97.7% (94.5% and 98.4% for positive and negative classes respectively).

CONCLUSION

Our study suggests that RLSC is potentially useful for predicting the odorant binding proteins from sequence-derived properties irrespective of sequence similarity. Our method predicts 92.8% of 56 odorant binding proteins non-homologous to any protein in the swissprot database and 97.1% of the 414 independent dataset proteins, suggesting the usefulness of RLSC method for facilitating the prediction of odorant binding proteins from sequence information.

摘要

背景

气味结合蛋白（OBPs）被认为可将环境中的气味分子转运至其下方的气味受体，它们可能作为气味分子呈现者发挥作用。尽管已采用多种基于序列的搜索方法进行蛋白质家族预测，但从序列数据预测OBPs的工作做得较少，而且由于这些蛋白质之间的序列同一性较差，该领域更具挑战性。

结果

在本文中，我们提出了一种新算法，该算法结合氨基酸的多种物理化学性质，使用正则化最小二乘分类器（RLSC）来预测气味结合蛋白。该算法应用于源自Pfam和GenDiS数据库的数据集，我们获得的总体预测准确率为97.7%（阳性和阴性类别分别为94.5%和98.4%）。

结论

我们的研究表明，无论序列相似性如何，RLSC对于从序列衍生特性预测气味结合蛋白可能是有用的。我们的方法预测了56种与swissprot数据库中任何蛋白质均无同源性的气味结合蛋白中的92.8%，以及414个独立数据集蛋白质中的97.1%，这表明RLSC方法对于从序列信息促进气味结合蛋白的预测是有用的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a5bb/2216042/e289b1353527/1471-2105-8-351-1.jpg

相似文献

A machine learning approach for the identification of odorant binding proteins from sequence-derived properties.一种基于序列衍生特性识别气味结合蛋白的机器学习方法。

BMC Bioinformatics. 2007 Sep 19;8:351. doi: 10.1186/1471-2105-8-351.

Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.使用多序列特征向量和二级结构从蛋白质序列预测二硫键连接性。

Bioinformatics. 2007 Dec 1;23(23):3147-54. doi: 10.1093/bioinformatics/btm505. Epub 2007 Oct 17.

High-throughput identification of interacting protein-protein binding sites.相互作用蛋白质-蛋白质结合位点的高通量鉴定

BMC Bioinformatics. 2007 Jun 27;8:223. doi: 10.1186/1471-2105-8-223.

Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models.使用支持向量机回归（SVR）模型对小鼠I类主要组织相容性复合体肽结合亲和力进行定量预测。

BMC Bioinformatics. 2006 Mar 31;7:182. doi: 10.1186/1471-2105-7-182.

Moment invariants as shape recognition technique for comparing protein binding sites.作为用于比较蛋白质结合位点的形状识别技术的矩不变量。

Bioinformatics. 2007 Dec 1;23(23):3139-46. doi: 10.1093/bioinformatics/btm503. Epub 2007 Oct 31.

Improved prediction of protein-protein binding sites using a support vector machines approach.使用支持向量机方法改进蛋白质-蛋白质结合位点的预测。

Bioinformatics. 2005 Apr 15;21(8):1487-94. doi: 10.1093/bioinformatics/bti242. Epub 2004 Dec 21.

Prediction of DNA-binding residues from sequence features.基于序列特征预测DNA结合残基。

J Bioinform Comput Biol. 2006 Dec;4(6):1141-58. doi: 10.1142/s0219720006002387.

Predicting residue-wise contact orders in proteins by support vector regression.通过支持向量回归预测蛋白质中残基水平的接触序。

BMC Bioinformatics. 2006 Oct 3;7:425. doi: 10.1186/1471-2105-7-425.

Prediction of HLA-DRB1*0401 binding peptides using support vector machine.使用支持向量机预测HLA-DRB1*0401结合肽段

Int J Data Min Bioinform. 2014;10(2):189-205. doi: 10.1504/ijdmb.2014.064015.

Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.使用新型基于隶属度的模糊支持向量机分类器预测蛋白质-蛋白质相互作用位点

IEEE/ACM Trans Comput Biol Bioinform. 2015 Nov-Dec;12(6):1394-404. doi: 10.1109/TCBB.2015.2401018.

引用本文的文献

Insight into the Relationships Between Chemical, Protein and Functional Variables in the PBP/GOBP Family in Moths Based on Machine Learning.基于机器学习对蛾类PBP/GOBP家族中化学、蛋白质和功能变量之间关系的洞察

Int J Mol Sci. 2025 Mar 5;26(5):2302. doi: 10.3390/ijms26052302.

GenDiS database update with improved approach and features to recognize homologous sequences of protein domain superfamilies.GenDiS 数据库更新，采用改进的方法和功能来识别蛋白质结构域超家族的同源序列。

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz042.

Odorant Binding Proteins: a key player in the sense of smell.气味结合蛋白：嗅觉中的关键角色。

Bioinformation. 2018 Jan 31;14(1):36-37. doi: 10.6026/97320630014036. eCollection 2018.

DOR - a Database of Olfactory Receptors - Integrated Repository for Sequence and Secondary Structural Information of Olfactory Receptors in Selected Eukaryotic Genomes.嗅觉受体数据库（DOR）——选定真核生物基因组中嗅觉受体序列和二级结构信息的综合储存库。

Bioinform Biol Insights. 2014 Jun 12;8:147-58. doi: 10.4137/BBI.S14858. eCollection 2014.

Prediction of lysine ubiquitylation with ensemble classifier and feature selection.基于集成分类器和特征选择的赖氨酸泛素化预测

Int J Mol Sci. 2011;12(12):8347-61. doi: 10.3390/ijms12128347. Epub 2011 Nov 28.

Insights into Protein Sequence and Structure-Derived Features Mediating 3D Domain Swapping Mechanism using Support Vector Machine Based Approach.利用基于支持向量机的方法深入了解介导三维结构域交换机制的蛋白质序列和结构衍生特征。

Bioinform Biol Insights. 2010 Jun 17;4:33-42. doi: 10.4137/bbi.s4464.

Large-scale identification of odorant-binding proteins and chemosensory proteins from expressed sequence tags in insects.大规模鉴定昆虫表达序列标签中的气味结合蛋白和化学感受蛋白。

BMC Genomics. 2009 Dec 25;10:632. doi: 10.1186/1471-2164-10-632.

Identification of protein functions using a machine-learning approach based on sequence-derived properties.基于序列衍生特性，采用机器学习方法鉴定蛋白质功能。

Proteome Sci. 2009 Aug 9;7:27. doi: 10.1186/1477-5956-7-27.

Prediction of functionally important sites from protein sequences using sparse kernel least squares classifiers.使用稀疏核最小二乘分类器从蛋白质序列预测功能重要位点。

Biochem Biophys Res Commun. 2009 Jun 26;384(2):155-9. doi: 10.1016/j.bbrc.2009.04.096. Epub 2009 Apr 24.

Using support vector machine combined with auto covariance to predict protein-protein interactions from protein sequences.利用支持向量机结合自协方差从蛋白质序列预测蛋白质-蛋白质相互作用。

Nucleic Acids Res. 2008 May;36(9):3025-30. doi: 10.1093/nar/gkn159. Epub 2008 Apr 4.

本文引用的文献

Pheromone binding and inactivation by moth antennae.蛾类触角对性信息素的结合与失活作用

Nature. 1981;293(5828):161-3. doi: 10.1038/293161a0.

Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides.信号-CF：一种用于预测信号肽的亚位点耦合和窗口融合方法。

Biochem Biophys Res Commun. 2007 Jun 8;357(3):633-40. doi: 10.1016/j.bbrc.2007.03.162. Epub 2007 Apr 5.

Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites.Euk-mPLoc：一种通过整合多个位点进行大规模真核生物蛋白质亚细胞定位预测的融合分类器。

J Proteome Res. 2007 May;6(5):1728-34. doi: 10.1021/pr060635i. Epub 2007 Mar 31.

Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites.Hum-mPLoc：一种通过纳入具有多个位点的样本进行大规模人类蛋白质亚细胞定位预测的集成分类器。

Biochem Biophys Res Commun. 2007 Apr 20;355(4):1006-11. doi: 10.1016/j.bbrc.2007.02.071. Epub 2007 Feb 23.

Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers.通过融合优化的证据理论K近邻分类器预测真核生物蛋白质亚细胞定位

J Proteome Res. 2006 Aug;5(8):1888-97. doi: 10.1021/pr060167c.

Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Hum-PLoc：一种用于预测人类蛋白质亚细胞定位的新型集成分类器。

Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21.

Ensemble classifier for protein fold pattern recognition.用于蛋白质折叠模式识别的集成分类器。

Bioinformatics. 2006 Jul 15;22(14):1717-22. doi: 10.1093/bioinformatics/btl170. Epub 2006 May 3.

Will my protein crystallize? A sequence-based predictor.我的蛋白质会结晶吗？一种基于序列的预测器。

Proteins. 2006 Feb 1;62(2):343-55. doi: 10.1002/prot.20789.

Prediction of membrane protein types by incorporating amphipathic effects.通过纳入两亲性效应预测膜蛋白类型。

J Chem Inf Model. 2005 Mar-Apr;45(2):407-13. doi: 10.1021/ci049686v.

GenDiS: Genomic Distribution of protein structural domain Superfamilies.GenDiS：蛋白质结构域超家族的基因组分布

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D252-5. doi: 10.1093/nar/gki087.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种基于序列衍生特性识别气味结合蛋白的机器学习方法。

A machine learning approach for the identification of odorant binding proteins from sequence-derived properties.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献