Suppr超能文献

CLAP:一个用于蛋白质自动分类的网络服务器,特别针对多结构域蛋白质。

CLAP: a web-server for automatic classification of proteins with special reference to multi-domain proteins.

作者信息

Gnanavel Mutharasu, Mehrotra Prachi, Rakshambikai Ramaswamy, Martin Juliette, Srinivasan Narayanaswamy, Bhaskara Ramachandra M

机构信息

Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.

出版信息

BMC Bioinformatics. 2014 Oct 4;15(1):343. doi: 10.1186/1471-2105-15-343.

Abstract

BACKGROUND

The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better.

RESULTS

Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions.Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family.

CONCLUSIONS

CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.

摘要

背景

从蛋白质结构比从其氨基酸序列能更准确地解读蛋白质的功能。由于可用蛋白质序列和结构空间存在巨大差距,仅利用序列信息就能生成功能同质簇的工具非常重要。为此,传统的基于比对的工具在大多数情况下效果良好,聚类是基于序列相似性进行的。但是,对于多结构域蛋白质,由于蛋白质长度不同、结构域重排或环形排列,比对质量可能较差。多结构域蛋白质在自然界中普遍存在,因此需要无比对工具,以克服基于比对的蛋白质比较方法的缺点。此外,现有工具仅使用结构域水平信息对蛋白质进行分类,因此遗漏了连接区域或辅助结构域中编码的信息。另一方面,我们的方法考虑了蛋白质的全长序列,整合完整的序列信息以更好地理解给定的蛋白质。

结果

我们的网络服务器CLAP(蛋白质分类)就是这样一种用于蛋白质序列自动分类的无比对软件。它利用一种模式匹配算法,为作为被比较的两个序列之间匹配模式一部分的残基分配局部匹配分数(LMS)。CLAP处理全长序列,不需要预先定义结构域。先前对蛋白激酶和免疫球蛋白进行的初步研究表明,CLAP产生的簇具有高度的功能和结构域架构相似性。此外,在统计确定的截止值处进行解析得到的簇与该特定结构域家族的亚家族水平分类相符。

结论

CLAP是一种有用的蛋白质聚类工具,与结构域分配、结构域顺序、序列长度和结构域多样性无关。我们的方法可用于任何一组蛋白质序列,产生具有高度结构域架构同质性的功能相关簇。CLAP网络服务器可在http://nslab.mbu.iisc.ernet.in/clap/免费供学术使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验