Manfredi Matteo, Vazzana Gabriele, Savojardo Castrense, Martelli Pier Luigi, Casadio Rita
Biocomputing Group, University of Bologna, Italy.
Dept. of Pharmacy and Biotechnology, University of Bologna, Italy.
Comput Struct Biotechnol J. 2025 Jan 14;27:461-466. doi: 10.1016/j.csbj.2025.01.008. eCollection 2025.
AlphaFold2 predicts protein structures from structural and functional knowledge. Alternatively, ESMFold does the same adopting protein language models. Here, we map available Pfam domains on pairs of models of the human reference proteome computed with both procedures and we compare the mapped regions relevant for functional annotation. We find that, rather irrespectively of the global superimposition of the pairwise models, Pfam-containing regions overlap with a TM-score above 0.8 and a predicted local distance difference test (pLDDT) which is higher than the rest of the modeled sequence. This indicates that both methods are similarly performing in modeled regions that overlap Pfam domains, carrying structural and functional information, with pLDDT values slightly higher for AlphaFold2. The mapping of 9834 Pfam domains also allows the location of 2578 active sites in 3382 enzymes of the human proteome, including 807 proteins for which the active site is not reported in UniProt.
AlphaFold2利用结构和功能知识预测蛋白质结构。另外,ESMFold采用蛋白质语言模型做同样的事情。在此,我们将可用的Pfam结构域映射到通过这两种方法计算得到的人类参考蛋白质组的成对模型上,并比较与功能注释相关的映射区域。我们发现,无论成对模型的整体叠加情况如何,含Pfam的区域都以高于0.8的TM分数重叠,且预测的局部距离差异测试(pLDDT)高于建模序列的其余部分。这表明两种方法在与Pfam结构域重叠的建模区域中表现相似,都携带结构和功能信息,AlphaFold2的pLDDT值略高。对9834个Pfam结构域的映射还能确定人类蛋白质组中3382种酶的2578个活性位点的位置,其中包括807种在UniProt中未报告活性位点的蛋白质。