Department of Biochemistry, Faculty of Medicine and Health Sciences, Université de Sherbrooke, 12e Avenue Nord, Sherbrooke, Québec, Canada.
Bioinformatics. 2012 Jun 1;28(11):1438-45. doi: 10.1093/bioinformatics/bts149. Epub 2012 Mar 30.
An increasing amount of evidence from experimental and computational analysis suggests that rare codon clusters are functionally important for protein activity. Most of the studies on rare codon clusters were performed on a limited number of proteins or protein families. In the present study, we present the Sherlocc program and how it can be used for large scale protein family analysis of evolutionarily conserved rare codon clusters and their relation to protein function and structure. This large-scale analysis was performed using the whole Pfam database covering over 70% of the known protein sequence universe. Our program Sherlocc, detects statistically relevant conserved rare codon clusters and produces a user-friendly HTML output.
Statistically significant rare codon clusters were detected in a multitude of Pfam protein families. The most statistically significant rare codon clusters were predominantly identified in N-terminal Pfam families. Many of the longest rare codon clusters are found in membrane-related proteins which are required to interact with other proteins as part of their function, for example in targeting or insertion. We identified some cases where rare codon clusters can play a regulating role in the folding of catalytically important domains. Our results support the existence of a widespread functional role for rare codon clusters across species. Finally, we developed an online filter-based search interface that provides access to Sherlocc results for all Pfam families.
The Sherlocc program and search interface are open access and are available at http://bcb.med.usherbrooke.ca
越来越多的实验和计算分析证据表明,稀有密码子簇对于蛋白质活性具有重要的功能作用。大多数关于稀有密码子簇的研究都是针对有限数量的蛋白质或蛋白质家族进行的。在本研究中,我们提出了 Sherlocc 程序及其在大规模蛋白质家族中分析进化保守的稀有密码子簇及其与蛋白质功能和结构的关系中的应用。这项大规模分析使用了涵盖超过 70%已知蛋白质序列宇宙的整个 Pfam 数据库。我们的 Sherlocc 程序可以检测到具有统计学意义的保守稀有密码子簇,并生成用户友好的 HTML 输出。
在众多 Pfam 蛋白质家族中检测到具有统计学意义的稀有密码子簇。最具统计学意义的稀有密码子簇主要存在于 N 端 Pfam 家族中。许多最长的稀有密码子簇存在于与膜相关的蛋白质中,这些蛋白质作为其功能的一部分需要与其他蛋白质相互作用,例如在靶向或插入时。我们确定了一些稀有密码子簇在催化重要结构域的折叠中发挥调节作用的情况。我们的研究结果支持稀有密码子簇在不同物种中广泛存在的功能作用。最后,我们开发了一个基于在线过滤的搜索界面,为所有 Pfam 家族提供 Sherlocc 结果的访问。
Sherlocc 程序和搜索界面是开放获取的,可在 http://bcb.med.usherbrooke.ca 上获得。