研究基于深度学习的结构预测能力，以推断和/或丰富抗体 CDR 典型结构的集合。

Investigating the ability of deep learning-based structure prediction to extrapolate and/or enrich the set of antibody CDR canonical forms.

机构信息

Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, United Kingdom.

出版信息

Front Immunol. 2024 Feb 28;15:1352703. doi: 10.3389/fimmu.2024.1352703. eCollection 2024.

DOI:10.3389/fimmu.2024.1352703

PMID:38482007

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10933040/

Abstract

Deep learning models have been shown to accurately predict protein structure from sequence, allowing researchers to explore protein space from the structural viewpoint. In this paper we explore whether "novel" features, such as distinct loop conformations can arise from these predictions despite not being present in the training data. Here we have used ABodyBuilder2, a deep learning antibody structure predictor, to predict the structures of ~1.5M paired antibody sequences. We examined the predicted structures of the canonical CDR loops and found that most of these predictions fall into the already described CDR canonical form structural space. We also found a small number of "new" canonical clusters composed of heterogeneous sequences united by a common sequence motif and loop conformation. Analysis of these novel clusters showed their origins to be either shapes seen in the training data at very low frequency or shapes seen at high frequency but at a shorter sequence length. To evaluate explicitly the ability of ABodyBuilder2 to extrapolate, we retrained several models whilst withholding all antibody structures of a specific CDR loop length or canonical form. These "starved" models showed evidence of generalisation across CDRs of different lengths, but they did not extrapolate to loop conformations which were highly distinct from those present in the training data. However, the models were able to accurately predict a canonical form even if only a very small number of examples of that shape were in the training data. Our results suggest that deep learning protein structure prediction methods are unable to make completely out-of-domain predictions for CDR loops. However, in our analysis we also found that even minimal amounts of data of a structural shape allow the method to recover its original predictive abilities. We have made the ~1.5 M predicted structures used in this study available to download at https://doi.org/10.5281/zenodo.10280181.

摘要

深度学习模型已被证明可以从序列中准确预测蛋白质结构，使研究人员能够从结构角度探索蛋白质空间。在本文中，我们探讨了即使在训练数据中不存在，这些预测是否也能产生“新颖”的特征，例如独特的环构象。在这里，我们使用了一种深度学习抗体结构预测器 ABodyBuilder2，来预测约 150 万对抗体序列的结构。我们检查了典型 CDR 环的预测结构，发现这些预测大多数都落入了已经描述的 CDR 典型结构空间。我们还发现了一小部分由共同序列模体和环构象连接的异质序列组成的“新”典型簇。对这些新簇的分析表明，它们的起源要么是在训练数据中以极低的频率看到的形状，要么是在高频但较短序列长度下看到的形状。为了明确评估 ABodyBuilder2 的外推能力，我们在不保留特定 CDR 环长度或典型形式的所有抗体结构的情况下，重新训练了几个模型。这些“饥饿”模型显示出跨不同长度 CDR 进行泛化的证据，但它们无法外推到与训练数据中存在的环构象高度不同的构象。然而，即使在训练数据中只有非常少数的这种形状的例子，该模型也能够准确地预测典型形式。我们的结果表明，深度学习蛋白质结构预测方法无法对 CDR 环进行完全非领域的预测。然而，在我们的分析中，我们还发现，即使是少量的结构形状数据，也允许该方法恢复其原始预测能力。我们已经将本研究中使用的约 1500 万预测结构可在 https://doi.org/10.5281/zenodo.10280181 下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a4a/10933040/48656b76cf6c/fimmu-15-1352703-g001.jpg

相似文献

Investigating the ability of deep learning-based structure prediction to extrapolate and/or enrich the set of antibody CDR canonical forms.研究基于深度学习的结构预测能力，以推断和/或丰富抗体 CDR 典型结构的集合。

Front Immunol. 2024 Feb 28;15:1352703. doi: 10.3389/fimmu.2024.1352703. eCollection 2024.

Length-independent structural similarities enrich the antibody CDR canonical class model.与长度无关的结构相似性丰富了抗体互补决定区（CDR）的典型类别模型。

MAbs. 2016 May-Jun;8(4):751-60. doi: 10.1080/19420862.2016.1158370.

Transitions of CDR-L3 Loop Canonical Cluster Conformations on the Micro-to-Millisecond Timescale.CDR-L3 环构象规则簇在微秒到毫秒时间尺度上的转变。

Front Immunol. 2019 Nov 19;10:2652. doi: 10.3389/fimmu.2019.02652. eCollection 2019.

A new clustering of antibody CDR loop conformations.一种新的抗体 CDR 环构象聚类。

J Mol Biol. 2011 Feb 18;406(2):228-56. doi: 10.1016/j.jmb.2010.10.030. Epub 2010 Oct 28.

Coevolved Canonical Loops Conformations of Single-Domain Antibodies: A Tale of Three Pockets Playing Musical Chairs.单域抗体的共进化规范环构象：三个口袋玩抢椅子游戏的故事。

Front Immunol. 2022 Jun 3;13:884132. doi: 10.3389/fimmu.2022.884132. eCollection 2022.

Geometric potentials from deep learning improve prediction of CDR H3 loop structures.深度学习衍生的几何势能可提高 CDR H3 环结构预测能力。

Bioinformatics. 2020 Jul 1;36(Suppl_1):i268-i275. doi: 10.1093/bioinformatics/btaa457.

ABlooper: fast accurate antibody CDR loop structure prediction with accuracy estimation.ABlooper：快速准确的抗体 CDR 环结构预测及其准确性评估。

Bioinformatics. 2022 Mar 28;38(7):1877-1880. doi: 10.1093/bioinformatics/btac016.

Predicting antibody complementarity determining region structures without classification.无需分类预测抗体互补决定区结构。

Mol Biosyst. 2011 Dec;7(12):3327-34. doi: 10.1039/c1mb05223c. Epub 2011 Oct 20.

Antibody CDR loops as ensembles in solution vs. canonical clusters from X-ray structures.抗体 CDR 环在溶液中的集合与 X 射线结构中的典型簇。

MAbs. 2020 Jan-Dec;12(1):1744328. doi: 10.1080/19420862.2020.1744328.

Automated classification of antibody complementarity determining region 3 of the heavy chain (H3) loops into canonical forms and its application to protein structure prediction.重链（H3）环的抗体互补决定区3自动分类为标准形式及其在蛋白质结构预测中的应用。

J Mol Biol. 1998 Jun 26;279(5):1193-210. doi: 10.1006/jmbi.1998.1847.

引用本文的文献

T-cell receptor structures and predictive models reveal comparable alpha and beta chain structural diversity despite differing genetic complexity.T细胞受体结构和预测模型显示，尽管基因复杂性不同，但α链和β链的结构多样性具有可比性。

Commun Biol. 2025 Mar 4;8(1):362. doi: 10.1038/s42003-025-07708-6.

Quantifying conformational changes in the TCR:pMHC-I binding interface.量化TCR:pMHC-I结合界面中的构象变化。

Front Immunol. 2024 Dec 2;15:1491656. doi: 10.3389/fimmu.2024.1491656. eCollection 2024.

ABodyBuilder3: improved and scalable antibody structure predictions.ABodyBuilder3：改进和可扩展的抗体结构预测。

Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae576.

本文引用的文献

OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization.OpenFold：重新训练 AlphaFold2 可深入了解其学习机制和泛化能力。

Nat Methods. 2024 Aug;21(8):1514-1524. doi: 10.1038/s41592-024-02272-z. Epub 2024 May 14.

PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences.PoseBusters：基于人工智能的对接方法无法生成符合物理原理的构象，也无法推广到新序列。

Chem Sci. 2023 Dec 13;15(9):3130-3139. doi: 10.1039/d3sc04185a. eCollection 2024 Feb 28.

Is Novelty Predictable?新颖性可预测吗？

Cold Spring Harb Perspect Biol. 2024 Feb 1;16(2):a041469. doi: 10.1101/cshperspect.a041469.

Specific attributes of the V domain influence both the structure and structural variability of CDR-H3 through steric effects.V结构域的特定属性通过空间效应影响CDR-H3的结构和结构变异性。

Front Immunol. 2023 Jul 26;14:1223802. doi: 10.3389/fimmu.2023.1223802. eCollection 2023.

ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins.免疫构建体：用于预测免疫蛋白结构的深度学习模型。

Commun Biol. 2023 May 29;6(1):575. doi: 10.1038/s42003-023-04927-7.

Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。

Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.

Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。

Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.

AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms.AlphaFold2 揭示了 21 个模式生物的蛋白质结构空间中的共性和新颖性。

Commun Biol. 2023 Feb 8;6(1):160. doi: 10.1038/s42003-023-04488-9.

Antibody Sequence and Structure Analyses Using IMGT: 30 Years of Immunoinformatics.使用 IMGT 进行抗体序列和结构分析：30 年免疫信息学。

Methods Mol Biol. 2023;2552:3-59. doi: 10.1007/978-1-0716-2609-2_1.

AlphaFold2 fails to predict protein fold switching.AlphaFold2 无法预测蛋白质构象转变。

Protein Sci. 2022 Jun;31(6):e4353. doi: 10.1002/pro.4353.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

研究基于深度学习的结构预测能力，以推断和/或丰富抗体 CDR 典型结构的集合。

Investigating the ability of deep learning-based structure prediction to extrapolate and/or enrich the set of antibody CDR canonical forms.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献