• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

AbSet:用于机器学习应用的抗体结构标准化数据集。

AbSet: A Standardized Data Set of Antibody Structures for Machine Learning Applications.

作者信息

Almeida Diego S, Almeida Matheus V, Sampaio Jean V, Gaieta Eduardo M, Costa Andrielly H S, Rabelo Francisco F A, Cavalcante César L, Sartori Geraldo R, Silva João H M

机构信息

Laboratory of Structural and Functional Biology Applied to Biopharmaceuticals, Fundação Oswaldo Cruz, Fiocruz Ceará, Eusébio 61773-270, Brazil.

Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Rio de Janeiro 21040-900, Brazil.

出版信息

J Chem Inf Model. 2025 May 26;65(10):4767-4774. doi: 10.1021/acs.jcim.5c00410. Epub 2025 May 11.

DOI:10.1021/acs.jcim.5c00410
PMID:40349368
Abstract

Machine learning algorithms have played a fundamental role in the development of therapeutic antibodies by being trained on data sets of sequences and/or structures. However, structural data sets remain limited, especially those that include antibody-antigen complexes. Additionally, many of the available structures are not standardized, and antibody-specific databases often do not provide molecular descriptors that could enhance ML models. To address this gap, we introduce AbSet, a curated dataset comprising over 800,000 antibody structures and corresponding molecular descriptors, including both experimentally determined and in silico-generated antibody-antigen complexes. We systematically retrieved antibody structures from the Protein Data Bank (PDB), applied rigorous standardization protocols, and expanded the dataset through large-scale protein-protein docking to generate structural variants of antibody-antigen interactions. Each model was classified as high, medium, acceptable, or incorrect quality based on structural similarity to reference experimental complexes. This classification enables both the construction of a decoy set of confirmed non-binders and the generation of high-confidence augmented structural data for machine learning applications. AbSet is publicly available via the Zenodo repository, with accompanying scripts hosted on GitHub (https://github.com/SFBBGroup/AbSet.git).

摘要

机器学习算法通过在序列和/或结构数据集上进行训练,在治疗性抗体的开发中发挥了重要作用。然而,结构数据集仍然有限,尤其是那些包含抗体-抗原复合物的数据集。此外,许多可用结构未标准化,抗体特异性数据库通常不提供可增强机器学习模型的分子描述符。为了弥补这一差距,我们引入了AbSet,这是一个经过整理的数据集,包含超过80万个抗体结构和相应的分子描述符,包括实验确定的和计算机生成的抗体-抗原复合物。我们从蛋白质数据库(PDB)中系统地检索抗体结构,应用严格的标准化协议,并通过大规模蛋白质-蛋白质对接扩展数据集,以生成抗体-抗原相互作用的结构变体。根据与参考实验复合物的结构相似性,每个模型被分类为高质量、中等质量、可接受质量或低质量。这种分类既能够构建一组经过确认的非结合诱饵集,也能够为机器学习应用生成高置信度的增强结构数据。AbSet可通过Zenodo存储库公开获取,相关脚本托管在GitHub上(https://github.com/SFBBGroup/AbSet.git)。

相似文献

1
AbSet: A Standardized Data Set of Antibody Structures for Machine Learning Applications.AbSet:用于机器学习应用的抗体结构标准化数据集。
J Chem Inf Model. 2025 May 26;65(10):4767-4774. doi: 10.1021/acs.jcim.5c00410. Epub 2025 May 11.
2
Towards the accurate modelling of antibody-antigen complexes from sequence using machine learning and information-driven docking.利用机器学习和信息驱动对接技术对抗体-抗原复合物进行精确建模。
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae583.
3
Prediction of Antibody-Antigen Binding via Machine Learning: Development of Data Sets and Evaluation of Methods.通过机器学习预测抗体 - 抗原结合:数据集的开发与方法评估
JMIR Bioinform Biotechnol. 2022 Oct 28;3(1):e29404. doi: 10.2196/29404.
4
Flexible protein-protein docking with a multitrack iterative transformer.灵活的蛋白质-蛋白质对接与多轨迹迭代变换。
Protein Sci. 2024 Feb;33(2):e4862. doi: 10.1002/pro.4862.
5
Model Building of Antibody-Antigen Complex Structures Using GBSA Scores.使用广义玻恩表面面积(GBSA)分数构建抗体-抗原复合物结构模型。
J Chem Inf Model. 2016 Oct 24;56(10):2005-2012. doi: 10.1021/acs.jcim.6b00066. Epub 2016 Sep 23.
6
An expanded benchmark for antibody-antigen docking and affinity prediction reveals insights into antibody recognition determinants.抗体-抗原对接和亲和力预测的扩展基准揭示了抗体识别决定因素的新见解。
Structure. 2021 Jun 3;29(6):606-621.e5. doi: 10.1016/j.str.2021.01.005. Epub 2021 Feb 3.
7
ProPairs: A Data Set for Protein-Protein Docking.ProPairs:一个用于蛋白质对接的数据集。
J Chem Inf Model. 2015 Jul 27;55(7):1495-507. doi: 10.1021/acs.jcim.5b00082. Epub 2015 Jun 15.
8
A comparison of antibody-antigen complex sequence-to-structure prediction methods and their systematic biases.抗体-抗原复合物序列到结构预测方法的比较及其系统偏差。
Protein Sci. 2024 Sep;33(9):e5127. doi: 10.1002/pro.5127.
9
PlasmoFAB: a benchmark to foster machine learning for Plasmodium falciparum protein antigen candidate prediction.PlasmoFAB:促进疟原虫蛋白抗原候选预测机器学习的基准
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i86-i93. doi: 10.1093/bioinformatics/btad206.
10
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

本文引用的文献

1
PLAbDab-nano: a database of camelid and shark nanobodies from patents and literature.PLAbDab-纳米抗体数据库:一个来自专利和文献的骆驼科动物及鲨鱼纳米抗体数据库。
Nucleic Acids Res. 2025 Jan 6;53(D1):D535-D542. doi: 10.1093/nar/gkae881.
2
AAontology: An Ontology of Amino Acid Scales for Interpretable Machine Learning.AAontology:用于可解释机器学习的氨基酸尺度本体。
J Mol Biol. 2024 Oct 1;436(19):168717. doi: 10.1016/j.jmb.2024.168717. Epub 2024 Jul 24.
3
Protein loop structure prediction by community-based deep learning and its application to antibody CDR H3 loop modeling.
基于社区的深度学习进行蛋白质环结构预测及其在抗体 CDR H3 环建模中的应用。
PLoS Comput Biol. 2024 Jun 24;20(6):e1012239. doi: 10.1371/journal.pcbi.1012239. eCollection 2024 Jun.
4
Benchmarking antibody clustering methods using sequence, structural, and machine learning similarity measures for antibody discovery applications.使用序列、结构和机器学习相似性度量对抗体聚类方法进行基准测试,以用于抗体发现应用。
Front Mol Biosci. 2024 Mar 28;11:1352508. doi: 10.3389/fmolb.2024.1352508. eCollection 2024.
5
Annotating Macromolecular Complexes in the Protein Data Bank: Improving the FAIRness of Structure Data.注释蛋白质数据库中的大分子复合物:提高结构数据的 FAIR 性。
Sci Data. 2023 Dec 1;10(1):853. doi: 10.1038/s41597-023-02778-9.
6
DIPS-Plus: The enhanced database of interacting protein structures for interface prediction.DIPS-Plus:用于界面预测的增强型互作蛋白结构数据库。
Sci Data. 2023 Aug 3;10(1):509. doi: 10.1038/s41597-023-02409-3.
7
Accelerating antibody discovery and design with artificial intelligence: Recent advances and prospects.利用人工智能加速抗体发现和设计:最新进展和前景。
Semin Cancer Biol. 2023 Oct;95:13-24. doi: 10.1016/j.semcancer.2023.06.005. Epub 2023 Jun 22.
8
ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins.免疫构建体:用于预测免疫蛋白结构的深度学习模型。
Commun Biol. 2023 May 29;6(1):575. doi: 10.1038/s42003-023-04927-7.
9
Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies.基于大规模天然抗体数据集的深度学习实现快速、准确的抗体结构预测。
Nat Commun. 2023 Apr 25;14(1):2389. doi: 10.1038/s41467-023-38063-x.
10
Computational and artificial intelligence-based methods for antibody development.基于计算和人工智能的抗体开发方法。
Trends Pharmacol Sci. 2023 Mar;44(3):175-189. doi: 10.1016/j.tips.2022.12.005. Epub 2023 Jan 18.