• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用氨基酸和氨基酸对组成预测蛋白质亚细胞定位的监督学习方法。

Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition.

作者信息

Habib Tanwir, Zhang Chaoyang, Yang Jack Y, Yang Mary Qu, Deng Youping

机构信息

Department of Biological Sciences, University of Southern Mississippi, Hattiesburg, MS 39406, USA.

出版信息

BMC Genomics. 2008;9 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2164-9-S1-S16.

DOI:10.1186/1471-2164-9-S1-S16
PMID:18366605
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2386058/
Abstract

BACKGROUND

Occurrence of protein in the cell is an important step in understanding its function. It is highly desirable to predict a protein's subcellular locations automatically from its sequence. Most studied methods for prediction of subcellular localization of proteins are signal peptides, the location by sequence homology, and the correlation between the total amino acid compositions of proteins. Taking amino-acid composition and amino acid pair composition into consideration helps improving the prediction accuracy.

RESULTS

We constructed a dataset of protein sequences from SWISS-PROT database and segmented them into 12 classes based on their subcellular locations. SVM modules were trained to predict the subcellular location based on amino acid composition and amino acid pair composition. Results were calculated after 10-fold cross validation. Radial Basis Function (RBF) outperformed polynomial and linear kernel functions. Total prediction accuracy reached to 71.8% for amino acid composition and 77.0% for amino acid pair composition. In order to observe the impact of number of subcellular locations we constructed two more datasets of nine and five subcellular locations. Total accuracy was further improved to 79.9% and 85.66%.

CONCLUSIONS

A new SVM based approach is presented based on amino acid and amino acid pair composition. Result shows that data simulation and taking more protein features into consideration improves the accuracy to a great extent. It was also noticed that the data set needs to be crafted to take account of the distribution of data in all the classes.

摘要

背景

细胞中蛋白质的出现是理解其功能的重要一步。非常希望能从蛋白质序列自动预测其亚细胞定位。大多数研究的蛋白质亚细胞定位预测方法是信号肽、基于序列同源性的定位以及蛋白质总氨基酸组成之间的相关性。考虑氨基酸组成和氨基酸对组成有助于提高预测准确性。

结果

我们从SWISS-PROT数据库构建了一个蛋白质序列数据集,并根据其亚细胞定位将它们分为12类。训练支持向量机(SVM)模块以基于氨基酸组成和氨基酸对组成预测亚细胞定位。在10折交叉验证后计算结果。径向基函数(RBF)优于多项式和线性核函数。氨基酸组成的总预测准确率达到71.8%,氨基酸对组成的总预测准确率达到77.0%。为了观察亚细胞定位数量的影响,我们又构建了两个分别包含9个和5个亚细胞定位的数据集。总准确率进一步提高到79.9%和85.66%。

结论

提出了一种基于氨基酸和氨基酸对组成的新的支持向量机方法。结果表明,数据模拟和考虑更多蛋白质特征在很大程度上提高了准确率。还注意到需要精心构建数据集以考虑所有类中数据的分布。

相似文献

1
Supervised learning method for the prediction of subcellular localization of proteins using amino acid and amino acid pair composition.使用氨基酸和氨基酸对组成预测蛋白质亚细胞定位的监督学习方法。
BMC Genomics. 2008;9 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2164-9-S1-S16.
2
Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.利用氨基酸组成和氨基酸对,通过支持向量机预测蛋白质亚细胞定位。
Bioinformatics. 2003 Sep 1;19(13):1656-63. doi: 10.1093/bioinformatics/btg222.
3
pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties.pSLIP:基于支持向量机并利用多种物理化学性质进行蛋白质亚细胞定位预测
BMC Bioinformatics. 2005 Jun 17;6:152. doi: 10.1186/1471-2105-6-152.
4
A complexity-based method for predicting protein subcellular location.一种基于复杂性预测蛋白质亚细胞定位的方法。
Amino Acids. 2009 Jul;37(2):427-33. doi: 10.1007/s00726-008-0172-0. Epub 2008 Aug 22.
5
A novel approach for protein subcellular location prediction using amino acid exposure.一种利用氨基酸暴露进行蛋白质亚细胞定位预测的新方法。
BMC Bioinformatics. 2013 Nov 28;14:342. doi: 10.1186/1471-2105-14-342.
6
ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.ESLpred:基于支持向量机的方法,利用二肽组成和PSI-BLAST对真核蛋白质进行亚细胞定位。
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W414-9. doi: 10.1093/nar/gkh350.
7
Prediction of protein subcellular localization.蛋白质亚细胞定位预测
Proteins. 2006 Aug 15;64(3):643-51. doi: 10.1002/prot.21018.
8
AAIndexLoc: predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices.AAIndexLoc:基于使用氨基酸指数的序列新表示法预测蛋白质的亚细胞定位。
Amino Acids. 2008 Aug;35(2):345-53. doi: 10.1007/s00726-007-0616-y. Epub 2007 Dec 28.
9
Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization.Hum-PLoc:一种用于预测人类蛋白质亚细胞定位的新型集成分类器。
Biochem Biophys Res Commun. 2006 Aug 18;347(1):150-7. doi: 10.1016/j.bbrc.2006.06.059. Epub 2006 Jun 21.
10
Using Nearest Feature Line and Tunable Nearest Neighbor methods for prediction of protein subcellular locations.使用最近特征线和可调最近邻方法预测蛋白质亚细胞定位。
Comput Biol Chem. 2005 Oct;29(5):388-92. doi: 10.1016/j.compbiolchem.2005.08.002. Epub 2005 Oct 5.

引用本文的文献

1
Phage_UniR_LGBM: Phage Virion Proteins Classification with UniRep Features and LightGBM Model.噬菌体-UniR-LGBM:基于 UniRep 特征和 LightGBM 模型的噬菌体病毒蛋白分类。
Comput Math Methods Med. 2022 Apr 15;2022:9470683. doi: 10.1155/2022/9470683. eCollection 2022.
2
Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA.基于进化信息和 LDA 的两种新特征提取方法对凋亡蛋白的亚细胞定位预测
BMC Bioinformatics. 2020 May 24;21(1):212. doi: 10.1186/s12859-020-3539-1.
3
Effective automated feature construction and selection for classification of biological sequences.

本文引用的文献

1
Locating proteins in the cell using TargetP, SignalP and related tools.使用TargetP、SignalP及相关工具在细胞中定位蛋白质。
Nat Protoc. 2007;2(4):953-71. doi: 10.1038/nprot.2007.131.
2
Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search.基于支持向量机的方法,利用氨基酸组成、顺序及相似性搜索对人类蛋白质进行亚细胞定位
J Biol Chem. 2005 Apr 15;280(15):14427-32. doi: 10.1074/jbc.M411789200. Epub 2005 Jan 12.
3
ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.
用于生物序列分类的有效自动特征构建与选择
PLoS One. 2014 Jul 17;9(7):e99982. doi: 10.1371/journal.pone.0099982. eCollection 2014.
4
PlantLoc: an accurate web server for predicting plant protein subcellular localization by substantiality motif.植物定位:一个准确的网络服务器,通过实质性基序预测植物蛋白质亚细胞定位。
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W441-7. doi: 10.1093/nar/gkt428. Epub 2013 May 31.
5
Dipeptide analysis of p53 mutations and evolution of p53 family proteins.p53 突变的二肽分析及 p53 家族蛋白的进化
Biochim Biophys Acta. 2014 Jan;1844(1 Pt B):198-206. doi: 10.1016/j.bbapap.2013.04.002. Epub 2013 Apr 10.
6
ESLpred2: improved method for predicting subcellular localization of eukaryotic proteins.ESLpred2:预测真核生物蛋白质亚细胞定位的改进方法。
BMC Bioinformatics. 2008 Nov 28;9:503. doi: 10.1186/1471-2105-9-503.
7
Genomics, molecular imaging, bioinformatics, and bio-nano-info integration are synergistic components of translational medicine and personalized healthcare research.基因组学、分子成像、生物信息学以及生物纳米信息整合是转化医学和个性化医疗研究的协同组成部分。
BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):I1. doi: 10.1186/1471-2164-9-S2-I1.
8
Promoting synergistic research and education in genomics and bioinformatics.促进基因组学和生物信息学领域的协同研究与教育。
BMC Genomics. 2008;9 Suppl 1(Suppl 1):I1. doi: 10.1186/1471-2164-9-S1-I1.
ESLpred:基于支持向量机的方法,利用二肽组成和PSI-BLAST对真核蛋白质进行亚细胞定位。
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W414-9. doi: 10.1093/nar/gkh350.
4
Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.利用氨基酸组成和氨基酸对,通过支持向量机预测蛋白质亚细胞定位。
Bioinformatics. 2003 Sep 1;19(13):1656-63. doi: 10.1093/bioinformatics/btg222.
5
Support vector machine approach for protein subcellular localization prediction.用于蛋白质亚细胞定位预测的支持向量机方法
Bioinformatics. 2001 Aug;17(8):721-8. doi: 10.1093/bioinformatics/17.8.721.
6
Protein subcellular location prediction.蛋白质亚细胞定位预测
Protein Eng. 1999 Feb;12(2):107-18. doi: 10.1093/protein/12.2.107.
7
Wanted: subcellular localization of proteins based on sequence.
Trends Cell Biol. 1998 Apr;8(4):169-70. doi: 10.1016/s0962-8924(98)01226-4.
8
Using neural networks for prediction of the subcellular location of proteins.利用神经网络预测蛋白质的亚细胞定位。
Nucleic Acids Res. 1998 May 1;26(9):2230-6. doi: 10.1093/nar/26.9.2230.
9
Prediction of N-terminal protein sorting signals.
Curr Opin Struct Biol. 1997 Jun;7(3):394-8. doi: 10.1016/s0959-440x(97)80057-7.
10
The SWISS-PROT protein sequence data bank and its supplement TrEMBL.SWISS-PROT蛋白质序列数据库及其补充数据库TrEMBL。
Nucleic Acids Res. 1997 Jan 1;25(1):31-6. doi: 10.1093/nar/25.1.31.