Suppr超能文献

相互最佳结构命中:使用AlphaFold模型发现远源同源物。

Reciprocal best structure hits: using AlphaFold models to discover distant homologues.

作者信息

Monzon Vivian, Paysan-Lafosse Typhaine, Wood Valerie, Bateman Alex

机构信息

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB21 4HH, UK.

Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK.

出版信息

Bioinform Adv. 2022 Oct 6;2(1):vbac072. doi: 10.1093/bioadv/vbac072. eCollection 2022.

Abstract

MOTIVATION

The conventional methods to detect homologous protein pairs use the comparison of protein sequences. But the sequences of two homologous proteins may diverge significantly and consequently may be undetectable by standard approaches. The release of the AlphaFold 2.0 software enables the prediction of highly accurate protein structures and opens many opportunities to advance our understanding of protein functions, including the detection of homologous protein structure pairs.

RESULTS

In this proof-of-concept work, we search for the closest homologous protein pairs using the structure models of five model organisms from the AlphaFold database. We compare the results with homologous protein pairs detected by their sequence similarity and show that the structural matching approach finds a similar set of results. In addition, we detect potential novel homologs solely with the structural matching approach, which can help to understand the function of uncharacterized proteins and make previously overlooked connections between well-characterized proteins. We also observe limitations of our implementation of the structure-based approach, particularly when handling highly disordered proteins or short protein structures. Our work shows that high accuracy protein structure models can be used to discover homologous protein pairs, and we expose areas for improvement of this structural matching approach.

AVAILABILITY AND IMPLEMENTATION

Information to the discovered homologous protein pairs can be found at the following URL: https://doi.org/10.17863/CAM.87873. The code can be accessed here: https://github.com/VivianMonzon/Reciprocal_Best_Structure_Hits.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

检测同源蛋白对的传统方法是比较蛋白质序列。但是,两个同源蛋白的序列可能会有显著差异,因此可能无法通过标准方法检测到。AlphaFold 2.0软件的发布使得能够预测高度准确的蛋白质结构,并为推进我们对蛋白质功能的理解带来了许多机会,包括检测同源蛋白结构对。

结果

在这项概念验证工作中,我们使用来自AlphaFold数据库的五种模式生物的结构模型来搜索最接近的同源蛋白对。我们将结果与通过序列相似性检测到的同源蛋白对进行比较,结果表明结构匹配方法得到了一组相似的结果。此外,我们仅通过结构匹配方法检测到潜在的新同源物,这有助于理解未表征蛋白质的功能,并建立先前被忽视的已充分表征蛋白质之间的联系。我们还观察到基于结构的方法在实施过程中的局限性,特别是在处理高度无序的蛋白质或短蛋白质结构时。我们的工作表明,高精度的蛋白质结构模型可用于发现同源蛋白对,并且我们揭示了这种结构匹配方法有待改进的方面。

可用性和实现方式

已发现的同源蛋白对的信息可在以下网址找到:https://doi.org/10.17863/CAM.87873。代码可在此处访问:https://github.com/VivianMonzon/Reciprocal_Best_Structure_Hits。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e5b6/9710676/926eff01bea2/vbac072f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验