National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.
National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA.
Biopolymers. 2021 Oct;112(10):e23416. doi: 10.1002/bip.23416. Epub 2021 Jan 19.
Although most experimentally characterized proteins with similar sequences assume the same folds and perform similar functions, an increasing number of exceptions is emerging. One class of exceptions comprises sequence-similar fold switchers, whose secondary structures shift from α-helix <-> β-sheet through a small number of mutations, a sequence insertion, or a deletion. Predictive methods for identifying sequence-similar fold switchers are desirable because some are associated with disease and/or can perform different functions in cells. Here, we use homology-based secondary structure predictions to identify sequence-similar fold switchers from their amino acid sequences alone. To do this, we predicted the secondary structures of sequence-similar fold switchers using three different homology-based secondary structure predictors: PSIPRED, JPred4, and SPIDER3. We found that α-helix <-> β-strand prediction discrepancies from JPred4 discriminated between the different conformations of sequence-similar fold switchers with high statistical significance (P < 1.8*10 ). Thus, we used these discrepancies as a classifier and found that they can often robustly discriminate between sequence-similar fold switchers and sequence-similar proteins that maintain the same folds (Matthews Correlation Coefficient of 0.82). We found that JPred4 is a more robust predictor of sequence-similar fold switchers because of (a) the curated sequence database it uses to produce multiple sequence alignments and (b) its use of sequence profiles based on Hidden Markov Models. Our results indicate that inconsistencies between JPred4 secondary structure predictions can be used to identify some sequence-similar fold switchers from their sequences alone. Thus, the negative information from inconsistent secondary structure predictions can potentially be leveraged to identify sequence-similar fold switchers from the broad base of genomic sequences.
尽管大多数具有相似序列的实验表征蛋白假定具有相同的折叠并执行相似的功能,但越来越多的例外情况正在出现。一类例外情况包括序列相似折叠开关,其二级结构通过少量突变、序列插入或缺失从α-螺旋<->β-折叠转变。识别序列相似折叠开关的预测方法是可取的,因为有些与疾病相关,或者在细胞中可以执行不同的功能。在这里,我们仅使用基于同源性的二级结构预测从氨基酸序列中识别序列相似折叠开关。为此,我们使用三种不同的基于同源性的二级结构预测器 PSIPRED、JPred4 和 SPIDER3 来预测序列相似折叠开关的二级结构。我们发现,来自 JPred4 的α-螺旋<->β-链预测差异可以区分不同构象的序列相似折叠开关,具有很高的统计学意义(P < 1.8*10 )。因此,我们将这些差异用作分类器,发现它们通常可以稳健地区分序列相似折叠开关和保持相同折叠的序列相似蛋白(马修斯相关系数为 0.82)。我们发现 JPred4 是一种更稳健的序列相似折叠开关预测器,原因在于:(a) 它用于生成多重序列比对的经过精心整理的序列数据库,以及 (b) 它使用基于隐马尔可夫模型的序列轮廓。我们的结果表明,来自 JPred4 的二级结构预测不一致可以用于仅从其序列中识别出一些序列相似折叠开关。因此,不一致的二级结构预测的负面信息可能会被利用来从广泛的基因组序列中识别出序列相似的折叠开关。