• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CLAP:一个用于蛋白质自动分类的网络服务器,特别针对多结构域蛋白质。

CLAP: a web-server for automatic classification of proteins with special reference to multi-domain proteins.

作者信息

Gnanavel Mutharasu, Mehrotra Prachi, Rakshambikai Ramaswamy, Martin Juliette, Srinivasan Narayanaswamy, Bhaskara Ramachandra M

机构信息

Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.

出版信息

BMC Bioinformatics. 2014 Oct 4;15(1):343. doi: 10.1186/1471-2105-15-343.

DOI:10.1186/1471-2105-15-343
PMID:25282152
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4287353/
Abstract

BACKGROUND

The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better.

RESULTS

Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions.Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family.

CONCLUSIONS

CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.

摘要

背景

从蛋白质结构比从其氨基酸序列能更准确地解读蛋白质的功能。由于可用蛋白质序列和结构空间存在巨大差距,仅利用序列信息就能生成功能同质簇的工具非常重要。为此,传统的基于比对的工具在大多数情况下效果良好,聚类是基于序列相似性进行的。但是,对于多结构域蛋白质,由于蛋白质长度不同、结构域重排或环形排列,比对质量可能较差。多结构域蛋白质在自然界中普遍存在,因此需要无比对工具,以克服基于比对的蛋白质比较方法的缺点。此外,现有工具仅使用结构域水平信息对蛋白质进行分类,因此遗漏了连接区域或辅助结构域中编码的信息。另一方面,我们的方法考虑了蛋白质的全长序列,整合完整的序列信息以更好地理解给定的蛋白质。

结果

我们的网络服务器CLAP(蛋白质分类)就是这样一种用于蛋白质序列自动分类的无比对软件。它利用一种模式匹配算法,为作为被比较的两个序列之间匹配模式一部分的残基分配局部匹配分数(LMS)。CLAP处理全长序列,不需要预先定义结构域。先前对蛋白激酶和免疫球蛋白进行的初步研究表明,CLAP产生的簇具有高度的功能和结构域架构相似性。此外,在统计确定的截止值处进行解析得到的簇与该特定结构域家族的亚家族水平分类相符。

结论

CLAP是一种有用的蛋白质聚类工具,与结构域分配、结构域顺序、序列长度和结构域多样性无关。我们的方法可用于任何一组蛋白质序列,产生具有高度结构域架构同质性的功能相关簇。CLAP网络服务器可在http://nslab.mbu.iisc.ernet.in/clap/免费供学术使用。

相似文献

1
CLAP: a web-server for automatic classification of proteins with special reference to multi-domain proteins.CLAP:一个用于蛋白质自动分类的网络服务器,特别针对多结构域蛋白质。
BMC Bioinformatics. 2014 Oct 4;15(1):343. doi: 10.1186/1471-2105-15-343.
2
Clustering of multi-domain protein sequences.多域蛋白质序列聚类。
Proteins. 2018 Jul;86(7):759-776. doi: 10.1002/prot.25510. Epub 2018 May 6.
3
The relationship between classification of multi-domain proteins using an alignment-free approach and their functions: a case study with immunoglobulins.使用无比对方法对多结构域蛋白质进行分类与其功能之间的关系:以免疫球蛋白为例的研究。
Mol Biosyst. 2014 May;10(5):1082-93. doi: 10.1039/c3mb70443b.
4
RNA-TVcurve: a Web server for RNA secondary structure comparison based on a multi-scale similarity of its triple vector curve representation.RNA-TVcurve:一个基于三向量曲线表示的多尺度相似性进行RNA二级结构比较的网络服务器。
BMC Bioinformatics. 2017 Jan 21;18(1):51. doi: 10.1186/s12859-017-1481-7.
5
ProClust: improved clustering of protein sequences with an extended graph-based approach.ProClust:基于扩展的图形方法改进蛋白质序列聚类
Bioinformatics. 2002;18 Suppl 2:S182-91. doi: 10.1093/bioinformatics/18.suppl_2.s182.
6
STRALCP--structure alignment-based clustering of proteins.STRALCP——基于结构比对的蛋白质聚类
Nucleic Acids Res. 2007;35(22):e150. doi: 10.1093/nar/gkm1049. Epub 2007 Nov 26.
7
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
8
Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures.离散与连续蛋白质结构空间之间的交叉:对蛋白质结构自动分类及网络的见解。
PLoS Comput Biol. 2009 Mar;5(3):e1000331. doi: 10.1371/journal.pcbi.1000331. Epub 2009 Mar 27.
9
CoeViz: a web-based tool for coevolution analysis of protein residues.CoeViz:一种用于蛋白质残基共进化分析的基于网络的工具。
BMC Bioinformatics. 2016 Mar 8;17:119. doi: 10.1186/s12859-016-0975-z.
10
SALIGN: a web server for alignment of multiple protein sequences and structures.SALIGN:一个用于多个蛋白质序列和结构比对的网络服务器。
Bioinformatics. 2012 Aug 1;28(15):2072-3. doi: 10.1093/bioinformatics/bts302. Epub 2012 May 21.

引用本文的文献

1
Comparative protein analysis of two maize genotypes with contrasting tolerance to low temperature.两种玉米基因型对低温耐受性的比较蛋白质分析。
BMC Plant Biol. 2023 Apr 5;23(1):183. doi: 10.1186/s12870-023-04198-8.
2
iHyd-LysSite (EPSV): Identifying Hydroxylysine Sites in Protein Using Statistical Formulation by Extracting Enhanced Position and Sequence Variant Feature Technique.iHyd-LysSite(EPSV):通过提取增强位置和序列变异特征技术,使用统计公式识别蛋白质中的羟赖氨酸位点。
Curr Genomics. 2020 Nov;21(7):536-545. doi: 10.2174/1389202921999200831142629.
3
The Amino Acid Composition of Quadruplex Binding Proteins Reveals a Shared Motif and Predicts New Potential Quadruplex Interactors.

本文引用的文献

1
The relationship between classification of multi-domain proteins using an alignment-free approach and their functions: a case study with immunoglobulins.使用无比对方法对多结构域蛋白质进行分类与其功能之间的关系:以免疫球蛋白为例的研究。
Mol Biosyst. 2014 May;10(5):1082-93. doi: 10.1039/c3mb70443b.
2
SCOP2 prototype: a new approach to protein structure mining.SCOP2 原型:一种新的蛋白质结构挖掘方法。
Nucleic Acids Res. 2014 Jan;42(Database issue):D310-4. doi: 10.1093/nar/gkt1242. Epub 2013 Nov 29.
3
Pfam: the protein families database.
四链体结合蛋白的氨基酸组成揭示了一个共享基序,并预测了新的潜在四链体相互作用蛋白。
Molecules. 2018 Sep 13;23(9):2341. doi: 10.3390/molecules23092341.
4
Time-Resolved Analysis Reveals Rapid Dynamics and Broad Scope of the CBP/p300 Acetylome.时间分辨分析揭示了 CBP/p300 乙酰基组的快速动态和广泛范围。
Cell. 2018 Jun 28;174(1):231-244.e12. doi: 10.1016/j.cell.2018.04.033. Epub 2018 May 24.
5
Performance of Hidden Markov Models in Recovering the Standard Classification of Glycoside Hydrolases.隐马尔可夫模型在恢复糖苷水解酶标准分类中的性能
Evol Bioinform Online. 2017 Apr 20;13:1176934317703401. doi: 10.1177/1176934317703401. eCollection 2017.
Pfam:蛋白质家族数据库。
Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.
4
Pattern recognition and probabilistic measures in alignment-free sequence analysis.无比对序列分析中的模式识别与概率测度
Brief Bioinform. 2014 May;15(3):354-68. doi: 10.1093/bib/bbt070. Epub 2013 Oct 3.
5
Reorganizing the protein space at the Universal Protein Resource (UniProt).重新组织通用蛋白质资源库(UniProt)中的蛋白质空间。
Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5. doi: 10.1093/nar/gkr981. Epub 2011 Nov 18.
6
Classification of protein kinases on the basis of both kinase and non-kinase regions.基于激酶和非激酶区域对蛋白激酶进行分类。
PLoS One. 2010 Sep 15;5(9):e12460. doi: 10.1371/journal.pone.0012460.
7
CD-HIT Suite: a web server for clustering and comparing biological sequences.CD-HIT 套件:用于聚类和比较生物序列的网络服务器。
Bioinformatics. 2010 Mar 1;26(5):680-2. doi: 10.1093/bioinformatics/btq003. Epub 2010 Jan 6.
8
Protein domain organisation: adding order.蛋白质结构域组织:增添秩序。
BMC Bioinformatics. 2009 Jan 29;10:39. doi: 10.1186/1471-2105-10-39.
9
Predicting protein function from domain content.从结构域组成预测蛋白质功能。
Bioinformatics. 2008 Aug 1;24(15):1681-7. doi: 10.1093/bioinformatics/btn312. Epub 2008 Jun 30.
10
Clustal W and Clustal X version 2.0.Clustal W和Clustal X 2.0版本
Bioinformatics. 2007 Nov 1;23(21):2947-8. doi: 10.1093/bioinformatics/btm404. Epub 2007 Sep 10.