Suppr超能文献

多域蛋白质序列聚类。

Clustering of multi-domain protein sequences.

机构信息

Indian Institute of Science Mathematics Initiative, Bangalore, 560012, India.

Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India.

出版信息

Proteins. 2018 Jul;86(7):759-776. doi: 10.1002/prot.25510. Epub 2018 May 6.

Abstract

The overall function of a multi-domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence alignment-based methods commonly utilize domain-level information and provide classification only at the level of domains. Such methods are not capable of taking into account the contributions of other domains in the proteins, and domain-linker regions and classify multi-domain proteins. An alignment-free protein sequence comparison tool, CLAP (CLAssification of Proteins) was previously developed in our laboratory to especially handle multi-domain protein sequences without a requirement of defining domain boundaries and sequential order of domains. Through this method we aim to achieve a biologically meaningful classification scheme for multi-domain protein sequences. In this article, CLAP-based classification has been explored on 5 datasets of multi-domain proteins and we present detailed analysis for proteins containing (1) Tyrosine phosphatase and (2) SH3 domain. At the domain-level CLAP-based classification scheme resulted in a clustering similar to that obtained from an alignment-based method. CLAP-based clusters obtained for full-length datasets were shown to comprise of proteins with similar functions and domain architectures. Our study demonstrates that multi-domain proteins could be classified effectively by considering full-length sequences without a requirement of identification of domains in the sequence.

摘要

多域蛋白的整体功能取决于其组成结构域的功能和结构相互作用。传统的基于序列比对的方法通常利用域级信息,并仅在域级提供分类。这些方法无法考虑蛋白质中其他域的贡献,以及域链接区域和分类多域蛋白。我们实验室之前开发了一种无序列比对的蛋白质序列比较工具 CLAP(蛋白质分类),专门用于处理多域蛋白质序列,无需定义域边界和域的顺序。通过这种方法,我们旨在为多域蛋白质序列提供具有生物学意义的分类方案。在本文中,我们在 5 个多域蛋白质数据集上探索了基于 CLAP 的分类,并对含有(1)酪氨酸磷酸酶和(2)SH3 结构域的蛋白质进行了详细分析。在域级,基于 CLAP 的分类方案产生的聚类与基于比对的方法获得的聚类相似。对于全长数据集获得的 CLAP 聚类被证明包含具有相似功能和结构域架构的蛋白质。我们的研究表明,通过考虑全长序列而无需在序列中识别域,多域蛋白可以有效地进行分类。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验