Monzon Vivian, Paysan-Lafosse Typhaine, Wood Valerie, Bateman Alex
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB21 4HH, UK.
Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK.
Bioinform Adv. 2022 Oct 6;2(1):vbac072. doi: 10.1093/bioadv/vbac072. eCollection 2022.
The conventional methods to detect homologous protein pairs use the comparison of protein sequences. But the sequences of two homologous proteins may diverge significantly and consequently may be undetectable by standard approaches. The release of the AlphaFold 2.0 software enables the prediction of highly accurate protein structures and opens many opportunities to advance our understanding of protein functions, including the detection of homologous protein structure pairs.
In this proof-of-concept work, we search for the closest homologous protein pairs using the structure models of five model organisms from the AlphaFold database. We compare the results with homologous protein pairs detected by their sequence similarity and show that the structural matching approach finds a similar set of results. In addition, we detect potential novel homologs solely with the structural matching approach, which can help to understand the function of uncharacterized proteins and make previously overlooked connections between well-characterized proteins. We also observe limitations of our implementation of the structure-based approach, particularly when handling highly disordered proteins or short protein structures. Our work shows that high accuracy protein structure models can be used to discover homologous protein pairs, and we expose areas for improvement of this structural matching approach.
Information to the discovered homologous protein pairs can be found at the following URL: https://doi.org/10.17863/CAM.87873. The code can be accessed here: https://github.com/VivianMonzon/Reciprocal_Best_Structure_Hits.
Supplementary data are available at online.
检测同源蛋白对的传统方法是比较蛋白质序列。但是,两个同源蛋白的序列可能会有显著差异,因此可能无法通过标准方法检测到。AlphaFold 2.0软件的发布使得能够预测高度准确的蛋白质结构,并为推进我们对蛋白质功能的理解带来了许多机会,包括检测同源蛋白结构对。
在这项概念验证工作中,我们使用来自AlphaFold数据库的五种模式生物的结构模型来搜索最接近的同源蛋白对。我们将结果与通过序列相似性检测到的同源蛋白对进行比较,结果表明结构匹配方法得到了一组相似的结果。此外,我们仅通过结构匹配方法检测到潜在的新同源物,这有助于理解未表征蛋白质的功能,并建立先前被忽视的已充分表征蛋白质之间的联系。我们还观察到基于结构的方法在实施过程中的局限性,特别是在处理高度无序的蛋白质或短蛋白质结构时。我们的工作表明,高精度的蛋白质结构模型可用于发现同源蛋白对,并且我们揭示了这种结构匹配方法有待改进的方面。
已发现的同源蛋白对的信息可在以下网址找到:https://doi.org/10.17863/CAM.87873。代码可在此处访问:https://github.com/VivianMonzon/Reciprocal_Best_Structure_Hits。
补充数据可在网上获取。