• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无监督学习揭示了跨蛋白质类别的局部结构基序格局。

Unsupervised learning reveals landscape of local structural motifs across protein classes.

作者信息

Derry Alexander, Krupkin Haim, Tartici Alp, Altman Russ B

机构信息

Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, United States.

Department of Genetics, Stanford University, Stanford, CA 94305, United States.

出版信息

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf377.

DOI:10.1093/bioinformatics/btaf377
PMID:40569048
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12258146/
Abstract

MOTIVATION

Proteins are known to share similarities in local regions of three-dimensional (3D) structure even across disparate global folds. Such correspondences can help to shed light on functional relationships between proteins and identify conserved local structural features that lead to function. Self-supervised deep learning on large protein structure datasets has produced high-fidelity representations of local structural microenvironments, providing the opportunity to characterize the landscape of local structure and function at scale.

RESULTS

In this work, we leverage these representations to cluster over 15 million environments in the Protein Data Bank, resulting in the creation of a "lexicon" of local 3D motifs which form the building blocks of all known protein structures. We characterize these motifs and demonstrate that they provide valuable information for modeling structure and function at all scales of protein analysis, from full protein chains to binding pockets to individual amino acids. We devise a new protein representation based solely on its constituent local motifs and show that this representation enables state-of-the-art performance on protein structure search and model quality assessment. We then show that this approach enables accurate prediction of drug off-target interactions by modeling the similarity between local binding pockets. Finally, we identify structural motifs associated with pathogenic variants in the human proteome by leveraging the predicted structures in the AlphaFold structure database.

AVAILABILITY AND IMPLEMENTATION

All code and cluster data are available at https://github.com/awfderry/collapse-motifs.

摘要

动机

已知蛋白质即使在不同的整体折叠结构中,其三维(3D)结构的局部区域也存在相似性。这种对应关系有助于揭示蛋白质之间的功能关系,并识别导致功能的保守局部结构特征。对大型蛋白质结构数据集进行自监督深度学习,已经产生了局部结构微环境的高保真表示,从而有机会大规模地表征局部结构和功能的格局。

结果

在这项工作中,我们利用这些表示对蛋白质数据库中超过1500万个环境进行聚类,从而创建了一个局部3D基序的“词典”,这些基序构成了所有已知蛋白质结构的构建块。我们对这些基序进行了表征,并证明它们为蛋白质分析的所有尺度(从完整蛋白质链到结合口袋再到单个氨基酸)的结构和功能建模提供了有价值的信息。我们仅基于其组成的局部基序设计了一种新的蛋白质表示,并表明这种表示在蛋白质结构搜索和模型质量评估方面能够实现最先进的性能。然后,我们表明这种方法通过对局部结合口袋之间的相似性进行建模,能够准确预测药物脱靶相互作用。最后,我们通过利用AlphaFold结构数据库中的预测结构,识别了与人类蛋白质组中致病变体相关的结构基序。

可用性和实现方式

所有代码和聚类数据可在https://github.com/awfderry/collapse-motifs获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a07/12258146/cf247ca0089f/btaf377f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a07/12258146/abbf68f23069/btaf377f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a07/12258146/b6fae754b38f/btaf377f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a07/12258146/ee35ab186a02/btaf377f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a07/12258146/cf247ca0089f/btaf377f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a07/12258146/abbf68f23069/btaf377f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a07/12258146/b6fae754b38f/btaf377f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a07/12258146/ee35ab186a02/btaf377f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0a07/12258146/cf247ca0089f/btaf377f4.jpg

相似文献

1
Unsupervised learning reveals landscape of local structural motifs across protein classes.无监督学习揭示了跨蛋白质类别的局部结构基序格局。
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf377.
2
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
3
Harnessing deep learning for proteome-scale detection of amyloid signaling motifs.利用深度学习进行蛋白质组规模的淀粉样信号基序检测。
Bioinformatics. 2025 Jul 1;41(Supplement_1):i420-i428. doi: 10.1093/bioinformatics/btaf200.
4
Short-Term Memory Impairment短期记忆障碍
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Multiscale Probabilistic Modeling: A Bayesian Approach to Augment Mechanistic Models of Cell Signaling with Machine-Learning Predictions of Binding Affinity.多尺度概率建模:一种利用结合亲和力的机器学习预测增强细胞信号传导机制模型的贝叶斯方法。
bioRxiv. 2025 Jul 9:2025.05.23.655795. doi: 10.1101/2025.05.23.655795.
7
Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.医疗专业人员在急症医院环境中团队合作教育的经验:对定性文献的系统综述
JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.
8
Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。
Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.
9
A New Measure of Quantified Social Health Is Associated With Levels of Discomfort, Capability, and Mental and General Health Among Patients Seeking Musculoskeletal Specialty Care.一种新的量化社会健康指标与寻求肌肉骨骼专科护理的患者的不适程度、能力以及心理和总体健康水平相关。
Clin Orthop Relat Res. 2025 Apr 1;483(4):647-663. doi: 10.1097/CORR.0000000000003394. Epub 2025 Feb 5.
10
Combined Topological Data Analysis and Geometric Deep Learning Reveal Niches by the Quantification of Protein Binding Pockets.结合拓扑数据分析与几何深度学习通过蛋白质结合口袋的量化揭示生态位。
J Comput Biol. 2025 Jul;32(7):659-674. doi: 10.1089/cmb.2025.0076. Epub 2025 May 28.

本文引用的文献

1
Bilingual language model for protein sequence and structure.用于蛋白质序列和结构的双语语言模型。
NAR Genom Bioinform. 2024 Nov 15;6(4):lqae150. doi: 10.1093/nargab/lqae150. eCollection 2024 Dec.
2
Accurate proteome-wide missense variant effect prediction with AlphaMissense.使用 AlphaMissense 进行精确的全蛋白质错义变异效应预测。
Science. 2023 Sep 22;381(6664):eadg7492. doi: 10.1126/science.adg7492.
3
De novo design of protein structure and function with RFdiffusion.利用 RFdiffusion 从头设计蛋白质结构和功能。
Nature. 2023 Aug;620(7976):1089-1100. doi: 10.1038/s41586-023-06415-8. Epub 2023 Jul 11.
4
Deep Local Analysis deconstructs protein-protein interfaces and accurately estimates binding affinity changes upon mutation.深度局部分析方法可以对蛋白质-蛋白质界面进行解构,并准确估计突变对结合亲和力的影响。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i544-i552. doi: 10.1093/bioinformatics/btad231.
5
Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
6
COLLAPSE: A representation learning framework for identification and characterization of protein structural sites.崩溃:用于鉴定和描述蛋白质结构位点的表示学习框架。
Protein Sci. 2023 Feb;32(2):e4541. doi: 10.1002/pro.4541.
7
UniProt: the Universal Protein Knowledgebase in 2023.UniProt:2023 年的通用蛋白质知识库。
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.
8
PDBspheres: a method for finding 3D similarities in local regions in proteins.PDB球体:一种在蛋白质局部区域寻找三维相似性的方法。
NAR Genom Bioinform. 2022 Oct 10;4(4):lqac078. doi: 10.1093/nargab/lqac078. eCollection 2022 Dec.
9
Robust deep learning-based protein sequence design using ProteinMPNN.使用 ProteinMPNN 进行健壮的基于深度学习的蛋白质序列设计。
Science. 2022 Oct 7;378(6615):49-56. doi: 10.1126/science.add2187. Epub 2022 Sep 15.
10
Deep Local Analysis evaluates protein docking conformations with locally oriented cubes.深度局部分析使用局部定向的立方块评估蛋白质对接构象。
Bioinformatics. 2022 Sep 30;38(19):4505-4512. doi: 10.1093/bioinformatics/btac551.