Suppr超能文献

跨整个蛋白质组的多尺度结构相似性嵌入搜索。

Multi-scale structural similarity embedding search across entire proteomes.

作者信息

Segura Joan, Sanchez-Garcia Ruben, Bittrich Sebastian, Rose Yana, Burley Stephen K, Duarte Jose M

机构信息

Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA.

School of Science and Technology, IE University, Paseo de la Castellana 259, 28046 Madrid, Spain.

出版信息

bioRxiv. 2025 Mar 6:2025.02.28.640875. doi: 10.1101/2025.02.28.640875.

Abstract

The rapid expansion of three-dimensional (3D) biomolecular structure information, driven by breakthroughs in artificial intelligence/deep learning (AI/DL)-based structure predictions, has created an urgent need for scalable and efficient structure similarity search methods. Traditional alignment-based approaches, such as structural superposition tools, are computationally expensive and challenging to scale with the vast number of available macromolecular structures. Herein, we present a scalable structure similarity search strategy designed to navigate extensive repositories of experimentally determined structures and computed structure models predicted using AI/DL methods. Our approach leverages protein language models and a deep neural network architecture to transform 3D structures into fixed-length vectors, enabling efficient large-scale comparisons. Although trained to predict TM-scores between single-domain structures, our model generalizes beyond the domain level, accurately identifying 3D similarity for full-length polypeptide chains and multimeric assemblies. By integrating vector databases, our method facilitates efficient large-scale structure retrieval, addressing the growing challenges posed by the expanding volume of 3D biostructure information.

摘要

基于人工智能/深度学习(AI/DL)的结构预测取得的突破推动了三维(3D)生物分子结构信息的快速扩展,这迫切需要可扩展且高效的结构相似性搜索方法。传统的基于比对的方法,如结构叠加工具,计算成本高昂,且难以随着大量可用的大分子结构进行扩展。在此,我们提出了一种可扩展的结构相似性搜索策略,旨在浏览大量通过实验确定的结构库以及使用AI/DL方法预测的计算结构模型。我们的方法利用蛋白质语言模型和深度神经网络架构将3D结构转换为固定长度的向量,从而实现高效的大规模比较。尽管我们的模型是为预测单域结构之间的TM分数而训练的,但它能够推广到域级别之外,准确识别全长多肽链和多聚体组装体的3D相似性。通过整合向量数据库,我们的方法促进了高效的大规模结构检索,解决了3D生物结构信息不断增加所带来的日益严峻的挑战。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8c87/11908163/f8371fccf6aa/nihpp-2025.02.28.640875v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验