Department of Microbiology and Immunology, University of North Carolina, Chapel Hill, North Carolina, USA.
Department of Pathology, University of Virginia, Charlottesville, Virginia, USA.
Proteins. 2022 Nov;90(11):1973-1986. doi: 10.1002/prot.26390. Epub 2022 Jun 20.
Domains are the three-dimensional building blocks of proteins. An individual domain can occur in a variety of domain architectures that perform unique functions and are subject to different evolutionary selective pressures. We describe an approach to evaluate the variability in amino acid sequences of a single domain across architectural contexts. The ability to distinguish different evolutionary outcomes of one protein domain can help determine whether existing knowledge about a specific domain will apply to an uncharacterized protein, lead to insights and hypotheses about function, and guide experimental priorities. We developed and tested our approach on CheW-like domains (PF01584), which mediate protein/protein interactions and are difficult to compare experimentally. CheW-like domains occur in CheW scaffolding proteins, CheA kinases, and CheV proteins that regulate bacterial chemotaxis. We analyzed 16 domain architectures that included 94% of all CheW-like domains found in nature. We identified six Classes of CheW-like domains with presumed functional differences. CheV and most CheW proteins contained Class 1 domains, whereas some CheW proteins contained Class 6 (20%) or Class 2 (1%) domains instead. Most CheA proteins contained Class 3 domains. CheA proteins with multiple Hpt domains contained Class 4 domains. CheA proteins with two CheW-like domains contained one Class 3 and one Class 5. We also created SimpLogo, an innovative method for visualizing amino acid composition across large sets of multiple sequence alignments of arbitrary length. SimpLogo offers substantial advantages over standard sequence logos for comparison and analysis of related protein sequences. The R package for SimpLogo is freely available.
结构域是蛋白质的三维构建块。单个结构域可以存在于执行独特功能且受到不同进化选择压力的多种结构域架构中。我们描述了一种评估单个结构域在结构域背景下氨基酸序列变化的方法。区分一个蛋白质结构域的不同进化结果的能力可以帮助确定关于特定结构域的现有知识是否适用于未表征的蛋白质,从而产生关于功能的见解和假设,并指导实验优先级。我们在 CheW 样结构域(PF01584)上开发并测试了我们的方法,这些结构域介导蛋白质/蛋白质相互作用,并且难以进行实验比较。CheW 样结构域存在于 CheW 支架蛋白、CheA 激酶和 CheV 蛋白中,这些蛋白调节细菌趋化性。我们分析了 16 种结构域架构,其中包括自然界中发现的所有 CheW 样结构域的 94%。我们确定了六个 CheW 样结构域类别,具有假定的功能差异。CheV 和大多数 CheW 蛋白包含结构域 1,而一些 CheW 蛋白包含结构域 6(约 20%)或结构域 2(约 1%)。大多数 CheA 蛋白包含结构域 3。含有多个 Hpt 结构域的 CheA 蛋白包含结构域 4。含有两个 CheW 样结构域的 CheA 蛋白包含一个结构域 3 和一个结构域 5。我们还创建了 SimpLogo,这是一种用于可视化任意长度的大量多序列比对中氨基酸组成的创新方法。与标准序列标志相比,SimpLogo 为比较和分析相关蛋白质序列提供了很大的优势。SimpLogo 的 R 包可免费获取。