Suppr超能文献

对蛋白质的注释空间进行聚类。

Clustering the annotation space of proteins.

作者信息

Kunin Victor, Ouzounis Christos A

机构信息

Computational Genomics Group, EMBL-EBI, Cambridge, CB10 1SO, UK.

出版信息

BMC Bioinformatics. 2005 Feb 9;6:24. doi: 10.1186/1471-2105-6-24.

Abstract

BACKGROUND

Current protein clustering methods rely on either sequence or functional similarities between proteins, thereby limiting inferences to one of these areas.

RESULTS

Here we report a new approach, named CLAN, which clusters proteins according to both annotation and sequence similarity. This approach is extremely fast, clustering the complete SwissProt database within minutes. It is also accurate, recovering consistent protein families agreeing on average in more than 97% with sequence-based protein families from Pfam. Discrepancies between sequence- and annotation-based clusters were scrutinized and the reasons reported. We demonstrate examples for each of these cases, and thoroughly discuss an example of a propagated error in SwissProt: a vacuolar ATPase subunit M9.2 erroneously annotated as vacuolar ATP synthase subunit H. CLAN algorithm is available from the authors and the CLAN database is accessible at http://maine.ebi.ac.uk:8000/cgi-bin/clan/ClanSearch.pl

CONCLUSIONS

CLAN creates refined function-and-sequence specific protein families that can be used for identification and annotation of unknown family members. It also allows easy identification of erroneous annotations by spotting inconsistencies between similarities on annotation and sequence levels.

摘要

背景

当前的蛋白质聚类方法要么依赖于蛋白质之间的序列相似性,要么依赖于功能相似性,因此将推断局限于这些领域之一。

结果

在此,我们报告了一种名为CLAN的新方法,该方法根据注释和序列相似性对蛋白质进行聚类。这种方法速度极快,能在几分钟内对完整的SwissProt数据库进行聚类。它也很准确,所得到的一致蛋白质家族与来自Pfam的基于序列的蛋白质家族平均一致性超过97%。对基于序列和基于注释的聚类之间的差异进行了审查并报告了原因。我们展示了每种情况的示例,并深入讨论了SwissProt中一个传播错误的例子:液泡ATP酶亚基M9.2被错误注释为液泡ATP合酶亚基H。作者提供了CLAN算法,可通过http://maine.ebi.ac.uk:8000/cgi-bin/clan/ClanSearch.pl访问CLAN数据库。

结论

CLAN创建了精细的功能和序列特异性蛋白质家族,可用于识别和注释未知家族成员。它还通过发现注释和序列水平上相似性之间的不一致,便于识别错误注释。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56f6/552314/72268fde631a/1471-2105-6-24-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验