• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用支持向量机从序列信息中鉴定功能多样的脂联素蛋白。

Identification of functionally diverse lipocalin proteins from sequence information using support vector machine.

机构信息

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798, Singapore.

出版信息

Amino Acids. 2010 Aug;39(3):777-83. doi: 10.1007/s00726-010-0520-8. Epub 2010 Feb 26.

DOI:10.1007/s00726-010-0520-8
PMID:20186553
Abstract

Lipocalins are functionally diverse proteins that are composed of 120-180 amino acid residues. Members of this family have several important biological functions including ligand transport, cryptic coloration, sensory transduction, endonuclease activity, stress response activity in plants, odorant binding, prostaglandin biosynthesis, cellular homeostasis regulation, immunity, immunotherapy and so on. Identification of lipocalins from protein sequence is more challenging due to the poor sequence identity which often falls below the twilight zone. So far, no specific method has been reported to identify lipocalins from primary sequence. In this paper, we report a support vector machine (SVM) approach to predict lipocalins from protein sequence using sequence-derived properties. LipoPred was trained using a dataset consisting of 325 lipocalin proteins and 325 non-lipocalin proteins, and evaluated by an independent set of 140 lipocalin proteins and 21,447 non-lipocalin proteins. LipoPred achieved 88.61% accuracy with 89.26% sensitivity, 85.27% specificity and 0.74 Matthew's correlation coefficient (MCC). When applied on the test dataset, LipoPred achieved 84.25% accuracy with 88.57% sensitivity, 84.22% specificity and MCC of 0.16. LipoPred achieved better performance rate when compared with PSI-BLAST, HMM and SVM-Prot methods. Out of 218 lipocalins, LipoPred correctly predicted 194 proteins including 39 lipocalins that are non-homologous to any protein in the SWISSPROT database. This result shows that LipoPred is potentially useful for predicting the lipocalin proteins that have no sequence homologs in the sequence databases. Further, successful prediction of nine hypothetical lipocalin proteins and five new members of lipocalin family prove that LipoPred can be efficiently used to identify and annotate the new lipocalin proteins from sequence databases. The LipoPred software and dataset are available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/lipopred.htm.

摘要

脂质运载蛋白是功能多样的蛋白质,由 120-180 个氨基酸残基组成。该家族的成员具有多种重要的生物学功能,包括配体运输、隐藏颜色、感觉转导、内切核酸酶活性、植物应激反应活性、气味结合、前列腺素生物合成、细胞内稳态调节、免疫、免疫治疗等。由于序列同一性较差,通常低于黄昏带,因此从蛋白质序列中鉴定脂质运载蛋白更具挑战性。到目前为止,还没有报道从原始序列中鉴定脂质运载蛋白的特定方法。在本文中,我们报告了一种支持向量机(SVM)方法,该方法使用序列衍生特性从蛋白质序列中预测脂质运载蛋白。LipoPred 使用由 325 个脂质运载蛋白和 325 个非脂质运载蛋白组成的数据集进行训练,并使用独立的 140 个脂质运载蛋白和 21447 个非脂质运载蛋白数据集进行评估。LipoPred 在独立数据集上的准确率为 88.61%,敏感性为 89.26%,特异性为 85.27%,马修相关系数(MCC)为 0.74。当应用于测试数据集时,LipoPred 的准确率为 84.25%,敏感性为 88.57%,特异性为 84.22%,MCC 为 0.16。与 PSI-BLAST、HMM 和 SVM-Prot 方法相比,LipoPred 的性能更好。在 218 个脂质运载蛋白中,LipoPred 正确预测了 194 个蛋白质,包括 39 个与 SWISSPROT 数据库中任何蛋白质都没有同源性的脂质运载蛋白。这一结果表明,LipoPred 可能有助于预测在序列数据库中没有序列同源物的脂质运载蛋白。此外,对 9 个假设的脂质运载蛋白和 5 个脂质运载蛋白家族的新成员的成功预测证明,LipoPred 可以有效地用于从序列数据库中识别和注释新的脂质运载蛋白。LipoPred 软件和数据集可在 http://www3.ntu.edu.sg/home/EPNSugan/index_files/lipopred.htm 上获得。

相似文献

1
Identification of functionally diverse lipocalin proteins from sequence information using support vector machine.利用支持向量机从序列信息中鉴定功能多样的脂联素蛋白。
Amino Acids. 2010 Aug;39(3):777-83. doi: 10.1007/s00726-010-0520-8. Epub 2010 Feb 26.
2
SMpred: a support vector machine approach to identify structural motifs in protein structure without using evolutionary information.SMpred:一种不使用进化信息即可识别蛋白质结构中结构模体的支持向量机方法。
J Biomol Struct Dyn. 2010 Dec;28(3):405-14. doi: 10.1080/07391102.2010.10507369.
3
DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest.DNA-Prot:利用随机森林从蛋白质序列信息中识别DNA结合蛋白。
J Biomol Struct Dyn. 2009 Jun;26(6):679-86. doi: 10.1080/07391102.2009.10507281.
4
Exon-intron structure of outlier tick lipocalins indicate a monophyletic origin within the larger lipocalin family.异常蜱类脂钙蛋白的外显子-内含子结构表明其在更大的脂钙蛋白家族中起源于单系。
Insect Biochem Mol Biol. 2004 Jun;34(6):585-94. doi: 10.1016/j.ibmb.2004.03.006.
5
Distantly related lipocalins share two conserved clusters of hydrophobic residues: use in homology modeling.远亲的脂质运载蛋白共有两个保守的疏水残基簇:用于同源建模。
BMC Struct Biol. 2008 Jan 11;8:1. doi: 10.1186/1472-6807-8-1.
6
Exon-intron structure and evolution of the Lipocalin gene family.脂联素基因家族的外显子-内含子结构与进化
Mol Biol Evol. 2003 May;20(5):775-83. doi: 10.1093/molbev/msg079. Epub 2003 Apr 2.
7
Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach.基于支持向量机方法,通过序列衍生的物理化学性质预测金属结合蛋白的功能类别。
BMC Bioinformatics. 2006 Dec 18;7 Suppl 5(Suppl 5):S13. doi: 10.1186/1471-2105-7-S5-S13.
8
Better prediction of the location of alpha-turns in proteins with support vector machine.利用支持向量机更好地预测蛋白质中α-转角的位置。
Proteins. 2006 Oct 1;65(1):49-54. doi: 10.1002/prot.21062.
9
Prediction of ubiquitin proteins using artificial neural networks, hidden markov model and support vector machines.使用人工神经网络、隐马尔可夫模型和支持向量机对泛素蛋白进行预测。
In Silico Biol. 2007;7(6):559-68.
10
Prediction of RNA binding sites in a protein using SVM and PSSM profile.使用支持向量机和位置特异性得分矩阵预测蛋白质中的RNA结合位点。
Proteins. 2008 Apr;71(1):189-94. doi: 10.1002/prot.21677.

引用本文的文献

1
Optimizing lipocalin sequence classification with ensemble deep learning models.使用集成深度学习模型优化脂钙蛋白序列分类
PLoS One. 2025 Apr 16;20(4):e0319329. doi: 10.1371/journal.pone.0319329. eCollection 2025.
2
Pre-trained protein language model sheds new light on the prediction of Arabidopsis protein-protein interactions.预训练蛋白质语言模型为拟南芥蛋白质-蛋白质相互作用的预测带来新曙光。
Plant Methods. 2023 Dec 7;19(1):141. doi: 10.1186/s13007-023-01119-6.
3
SubmitoLoc: Identification of mitochondrial sub cellular locations of proteins using support vector machine.
SubmitoLoc:使用支持向量机鉴定蛋白质的线粒体亚细胞定位
Bioinformation. 2019 Dec 31;15(12):863-868. doi: 10.6026/97320630015863. eCollection 2019.
4
Probing an optimal class distribution for enhancing prediction and feature characterization of plant virus-encoded RNA-silencing suppressors.探索最佳类别分布以增强植物病毒编码的RNA沉默抑制子的预测和特征描述。
3 Biotech. 2016 Jun;6(1):93. doi: 10.1007/s13205-016-0410-1. Epub 2016 Mar 21.
5
DOR - a Database of Olfactory Receptors - Integrated Repository for Sequence and Secondary Structural Information of Olfactory Receptors in Selected Eukaryotic Genomes.嗅觉受体数据库(DOR)——选定真核生物基因组中嗅觉受体序列和二级结构信息的综合储存库。
Bioinform Biol Insights. 2014 Jun 12;8:147-58. doi: 10.4137/BBI.S14858. eCollection 2014.
6
Fuzzy clustering of physicochemical and biochemical properties of amino acids.氨基酸理化生化性质的模糊聚类。
Amino Acids. 2012 Aug;43(2):583-94. doi: 10.1007/s00726-011-1106-9. Epub 2011 Oct 13.