Szczerbiak Paweł, Szydlowski Lukasz M, Wydmański Witold, Renfrew P Douglas, Leman Julia Koehler, Kosciolek Tomasz
Sano Centre for Computational Medicine, Kraków, Poland.
Małopolska Centre of Biotechnology, Jagiellonian University, Kraków, Poland.
Nat Commun. 2025 Aug 25;16(1):7925. doi: 10.1038/s41467-025-63250-3.
Recent breakthroughs in protein structure prediction have led to a surge in high-quality 3D models, highlighting the need for efficient computational solutions. In our work, we examine the structural clusters from the AlphaFold Protein Structure Database (AFDB), a high-quality subset of ESMAtlas, and the Microbiome Immunity Project (MIP). We create a single cohesive low-dimensional representation of the resulting protein space. We show that, while each database occupies distinct regions, they collectively exhibit significant overlap in their functional profiles. High-level biological functions tend to cluster in particular regions, revealing a shared functional landscape despite the diverse sources of data. By creating a representation of protein structure space, localizing functional annotations within this space, and providing an open-access web-server for exploration, this work offers insights for future research concerning protein sequence-structure-function relationships, enabling biological questions to be asked about taxonomic assignments, environmental factors, or functional specificity. This approach is generalizable, thus enabling further discovery beyond findings presented here.
蛋白质结构预测方面的最新突破导致高质量3D模型激增,凸显了对高效计算解决方案的需求。在我们的工作中,我们研究了来自AlphaFold蛋白质结构数据库(AFDB)、ESMAtlas的一个高质量子集以及微生物组免疫项目(MIP)的结构簇。我们创建了所得蛋白质空间的单一连贯低维表示。我们表明,虽然每个数据库占据不同区域,但它们在功能概况上总体表现出显著重叠。高级生物学功能倾向于聚集在特定区域,揭示了尽管数据来源多样但仍存在共享的功能景观。通过创建蛋白质结构空间的表示、在该空间内定位功能注释并提供一个开放获取的网络服务器以供探索,这项工作为未来关于蛋白质序列-结构-功能关系的研究提供了见解,使人们能够就分类学归属、环境因素或功能特异性提出生物学问题。这种方法具有通用性,从而能够在此处呈现的发现之外进行进一步探索。