• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过测量序列片段的特征从氨基酸序列准确预测蛋白质家族。

The accurate prediction of protein family from amino acid sequence by measuring features of sequence fragments.

作者信息

Hong Huixiao, Hong Qilong, Perkins Roger, Shi Leming, Fang Hong, Su Zhenqiang, Dragan Yvonne, Fuscoe James C, Tong Weida

机构信息

Division of Systems Toxicology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, Arkansas 72079, USA.

出版信息

J Comput Biol. 2009 Dec;16(12):1671-88. doi: 10.1089/cmb.2008.0115.

DOI:10.1089/cmb.2008.0115
PMID:20047490
Abstract

The rapid advances in proteomic analyses coupled with the completion of multiple genomes have led to an increased demand for determining protein functions. The first step is classification or prediction into families. A method was developed for the prediction of protein family based only on protein sequence using support vector machine (SVM) models. In these models, the amino acids were classified into three categories (apolar, polar, and charged). Consecutive fragments ranging from one to five were annotated by amino acid type to define the protein features of each protein. SVM models were constructed based on the protein features of a training set of proteins and then examined with an independent set of proteins. The approach was tested for 20 protein families from the iProClass database of Protein Information Resources (PIR). For two-class SVM models, an average prediction accuracy of 0.9985 was achieved, while for multi-class SVM models an accuracy of 0.9941 was achieved. This study demonstrates that SVM based methods can accurately recognize and predict the protein family to which a sequence belongs based solely on its primary amino acid sequence.

摘要

蛋白质组学分析的快速进展以及多个基因组测序的完成,使得确定蛋白质功能的需求不断增加。第一步是对蛋白质进行分类或预测其所属家族。开发了一种仅基于蛋白质序列,利用支持向量机(SVM)模型预测蛋白质家族的方法。在这些模型中,氨基酸被分为三类(非极性、极性和带电)。从一到五个连续的片段通过氨基酸类型进行注释,以定义每个蛋白质的特征。基于一组训练蛋白质的特征构建支持向量机模型,然后用一组独立的蛋白质进行检验。该方法在蛋白质信息资源(PIR)的iProClass数据库中的20个蛋白质家族上进行了测试。对于两类支持向量机模型,平均预测准确率达到0.9985,而对于多类支持向量机模型,准确率达到0.9941。这项研究表明,基于支持向量机的方法能够仅根据蛋白质的一级氨基酸序列准确识别和预测其所属的蛋白质家族。

相似文献

1
The accurate prediction of protein family from amino acid sequence by measuring features of sequence fragments.通过测量序列片段的特征从氨基酸序列准确预测蛋白质家族。
J Comput Biol. 2009 Dec;16(12):1671-88. doi: 10.1089/cmb.2008.0115.
2
Remote protein homology detection and fold recognition using two-layer support vector machine classifiers.使用两层支持向量机分类器进行远程蛋白质同源检测和折叠识别。
Comput Biol Med. 2011 Aug;41(8):687-99. doi: 10.1016/j.compbiomed.2011.06.004. Epub 2011 Jun 25.
3
Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties.利用递归定量分析和氨基酸理化性质进行远程蛋白质同源性检测。
J Theor Biol. 2008 May 7;252(1):145-54. doi: 10.1016/j.jtbi.2008.01.028. Epub 2008 Feb 7.
4
Predicting protein secondary structure by a support vector machine based on a new coding scheme.基于一种新编码方案的支持向量机预测蛋白质二级结构
Genome Inform. 2004;15(2):181-90.
5
Application of latent semantic analysis to protein remote homology detection.潜在语义分析在蛋白质远程同源性检测中的应用。
Bioinformatics. 2006 Feb 1;22(3):285-90. doi: 10.1093/bioinformatics/bti801. Epub 2005 Nov 29.
6
Mismatch string kernels for discriminative protein classification.用于判别式蛋白质分类的错配字符串核
Bioinformatics. 2004 Mar 1;20(4):467-76. doi: 10.1093/bioinformatics/btg431. Epub 2004 Jan 22.
7
Efficient remote homology detection using local structure.利用局部结构进行高效的远程同源性检测。
Bioinformatics. 2003 Nov 22;19(17):2294-301. doi: 10.1093/bioinformatics/btg317.
8
Signal peptide discrimination and cleavage site identification using SVM and NN.使用 SVM 和 NN 进行信号肽识别和切割位点鉴定。
Comput Biol Med. 2014 Feb;45:98-110. doi: 10.1016/j.compbiomed.2013.11.017. Epub 2013 Dec 1.
9
SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence.SVM-Prot:基于网络的支持向量机软件,用于根据蛋白质一级序列进行功能分类。
Nucleic Acids Res. 2003 Jul 1;31(13):3692-7. doi: 10.1093/nar/gkg600.
10
Two multi-classification strategies used on SVM to predict protein structural classes by using auto covariance.两种使用自协方差的 SVM 多分类策略用于预测蛋白质结构类别。
Interdiscip Sci. 2009 Dec;1(4):315-9. doi: 10.1007/s12539-009-0066-1. Epub 2009 Nov 14.

引用本文的文献

1
Competitive docking model for prediction of the human nicotinic acetylcholine receptor α7 binding of tobacco constituents.用于预测烟草成分与人烟碱型乙酰胆碱受体α7结合的竞争性对接模型。
Oncotarget. 2018 Feb 8;9(24):16899-16916. doi: 10.18632/oncotarget.24458. eCollection 2018 Mar 30.
2
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides.sNebula,一种基于网络的算法,用于预测人类白细胞抗原与肽之间的结合。
Sci Rep. 2016 Aug 25;6:32115. doi: 10.1038/srep32115.
3
Pathway Analysis Revealed Potential Diverse Health Impacts of Flavonoids that Bind Estrogen Receptors.
通路分析揭示了与雌激素受体结合的类黄酮对健康的潜在多种影响。
Int J Environ Res Public Health. 2016 Mar 26;13(4):373. doi: 10.3390/ijerph13040373.
4
A Rat α-Fetoprotein Binding Activity Prediction Model to Facilitate Assessment of the Endocrine Disruption Potential of Environmental Chemicals.一种用于促进评估环境化学物质内分泌干扰潜力的大鼠甲胎蛋白结合活性预测模型。
Int J Environ Res Public Health. 2016 Mar 25;13(4):372. doi: 10.3390/ijerph13040372.
5
Understanding and predicting binding between human leukocyte antigens (HLAs) and peptides by network analysis.通过网络分析理解和预测人类白细胞抗原(HLA)与肽段之间的结合
BMC Bioinformatics. 2015;16 Suppl 13(Suppl 13):S9. doi: 10.1186/1471-2105-16-S13-S9. Epub 2015 Sep 25.
6
Classification of nucleotide sequences using support vector machines.基于支持向量机的核苷酸序列分类。
J Mol Evol. 2010 Oct;71(4):250-67. doi: 10.1007/s00239-010-9380-9. Epub 2010 Aug 26.