Structural Biology Group, Biological and Chemical Research Centre, Faculty of Chemistry, University of Warsaw, Żwirki i Wigury 101, 02-089 Warsaw, Poland.
Instituto de Investigação e Inovação em Saúde and Instituto de Biologia Molecular e Celular (IBMC), Universidade do Porto, 4200-135 Porto, Portugal.
Biomolecules. 2022 Jun 7;12(6):793. doi: 10.3390/biom12060793.
A conserved, 26-residue sequence [AA(X)[A/G]G/LGDVI/L[V/L]NGE(X)V(X)] and corresponding structure repeating module were identified within the HtrA protease family using a non-redundant set (N = 20) of publicly available structures. While the repeats themselves were far from sequence perfect, they had notable conservation to a statistically significant level. Three or more repetitions were identified within each protein despite being statistically expected to randomly occur only once per 1031 residues. This sequence repeat was associated with a six stranded antiparallel β-barrel module, two of which are present in the core of the structures of the PA clan of serine proteases, while a modified version of this module could be identified in the PDZ-like domains. Automated structural alignment methods had difficulties in superimposing these β-barrels, but the use of a target human HtrA2 structure showed that these modules had an average RMSD across the set of structures of less than 2 Å (mean and median). Our findings support Dayhoff's hypothesis that complex proteins arose through duplication of simpler peptide motifs and domains.
使用一组非冗余的公开可用结构(N=20),在 HtrA 蛋白酶家族中鉴定出一个保守的、由 26 个残基组成的序列[AA([A/G][G/L])GDV[I/L]([V/L]NGE)V(X)]和相应的结构重复模块。虽然重复本身远非序列完全相同,但它们具有显著的保守性,达到了统计上显著的水平。尽管统计上预计每个 1031 个残基只会随机出现一次,但在每个蛋白质中都鉴定出了三个或更多的重复。这个序列重复与一个六链反平行β桶模块相关,其中两个存在于丝氨酸蛋白酶 PA 族结构的核心中,而这个模块的一个修改版本可以在 PDZ 样结构域中被识别。自动结构比对方法在叠加这些β桶时遇到了困难,但使用目标人类 HtrA2 结构表明,这些模块在结构集中的平均 RMSD 小于 2Å(平均值和中位数)。我们的发现支持 Dayhoff 的假设,即复杂的蛋白质是通过简单肽基序和结构域的重复而产生的。