• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

纳米抗体大型语言模型辅助文库构建:利用蛋白质大型语言模型构建纳米抗体文库

NanoAbLLaMA: construction of nanobody libraries with protein large language models.

作者信息

Wang Xin, Chen Haotian, Chen Bo, Liang Lixin, Mei Fengcheng, Huang Bingding

机构信息

College of Big Data and Internet, Shenzhen Technology University, Shenzhen, China.

Chengdu NBbiolab. CO., LTD., SME Incubation Park, Chengdu, China.

出版信息

Front Chem. 2025 Feb 25;13:1545136. doi: 10.3389/fchem.2025.1545136. eCollection 2025.

DOI:10.3389/fchem.2025.1545136
PMID:40070407
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11893428/
Abstract

INTRODUCTION

Traditional methods for constructing synthetic nanobody libraries are labor-intensive and time-consuming. This study introduces a novel approach leveraging protein large language models (LLMs) to generate germline-specific nanobody sequences, enabling efficient library construction through statistical analysis.

METHODS

We developed NanoAbLLaMA, a protein LLM based on LLaMA2, fine-tuned using low-rank adaptation (LoRA) on 120,000 curated nanobody sequences. The model generates sequences conditioned on germlines (IGHV3-301 and IGHV3S5301). Training involved dataset preparation from SAbDab and experimental data, alignment with IMGT germline references, and structural validation using ImmuneBuilder and Foldseek.

RESULTS

NanoAbLLaMA achieved near-perfect germline generation accuracy (100% for IGHV3-301, 95.5% for IGHV3S5301). Structural evaluations demonstrated superior predicted Local Distance Difference Test (pLDDT) scores (90.32 ± 10.13) compared to IgLM (87.36 ± 11.17), with comparable TM-scores. Generated sequences exhibited fewer high-risk post-translational modification sites than IgLM. Statistical analysis of CDR regions confirmed diversity, particularly in CDR3, enabling the creation of synthetic libraries with high humanization (>99.9%) and low risk.

DISCUSSION

This work establishes a paradigm shift in nanobody library construction by integrating LLMs, significantly reducing time and resource demands. While NanoAbLLaMA excels in germline-specific generation, limitations include restricted germline coverage and framework flexibility. Future efforts should expand germline diversity and incorporate druggability metrics for clinical relevance. The model's code, data, and resources are publicly available to facilitate broader adoption.

摘要

引言

构建合成纳米抗体文库的传统方法既费力又耗时。本研究引入了一种利用蛋白质大语言模型(LLMs)生成种系特异性纳米抗体序列的新方法,通过统计分析实现高效文库构建。

方法

我们开发了基于LLaMA2的蛋白质大语言模型NanoAbLLaMA,并使用低秩适应(LoRA)在120,000条经过整理的纳米抗体序列上进行微调。该模型根据种系(IGHV3 - 301和IGHV3S5301)生成序列。训练包括从SAbDab和实验数据准备数据集,与IMGT种系参考进行比对,以及使用ImmuneBuilder和Foldseek进行结构验证。

结果

NanoAbLLaMA在种系生成准确性方面近乎完美(IGHV3 - 301为100%,IGHV3S5301为95.5%)。结构评估表明,与IgLM(87.36 ± 11.17)相比,其预测的局部距离差异测试(pLDDT)分数更高(90.32 ± 10.13),TM分数相当。生成的序列比IgLM具有更少的高风险翻译后修饰位点。对互补决定区(CDR)区域的统计分析证实了其多样性,特别是在CDR3中,能够创建具有高人源化(>99.9%)和低风险的合成文库。

讨论

这项工作通过整合大语言模型在纳米抗体文库构建方面实现了范式转变,显著减少了时间和资源需求。虽然NanoAbLLaMA在种系特异性生成方面表现出色,但其局限性包括种系覆盖范围有限和框架灵活性不足。未来的工作应扩大种系多样性并纳入可成药指标以提高临床相关性。该模型的代码、数据和资源已公开提供,以促进更广泛的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/39ef4b55213b/fchem-13-1545136-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/d016db6f8596/fchem-13-1545136-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/66a5f447bc54/fchem-13-1545136-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/77139d2d4c9a/fchem-13-1545136-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/21409d9230d6/fchem-13-1545136-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/012aa67af1c0/fchem-13-1545136-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/39ef4b55213b/fchem-13-1545136-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/d016db6f8596/fchem-13-1545136-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/66a5f447bc54/fchem-13-1545136-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/77139d2d4c9a/fchem-13-1545136-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/21409d9230d6/fchem-13-1545136-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/012aa67af1c0/fchem-13-1545136-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/22fa/11893428/39ef4b55213b/fchem-13-1545136-g006.jpg

相似文献

1
NanoAbLLaMA: construction of nanobody libraries with protein large language models.纳米抗体大型语言模型辅助文库构建:利用蛋白质大型语言模型构建纳米抗体文库
Front Chem. 2025 Feb 25;13:1545136. doi: 10.3389/fchem.2025.1545136. eCollection 2025.
2
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
3
Generative Large Language Model-Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19.用于个性化风险评估的生成式大语言模型驱动的对话式人工智能应用程序:COVID-19案例研究
JMIR AI. 2025 Mar 27;4:e67363. doi: 10.2196/67363.
4
Automated Pathologic TN Classification Prediction and Rationale Generation From Lung Cancer Surgical Pathology Reports Using a Large Language Model Fine-Tuned With Chain-of-Thought: Algorithm Development and Validation Study.使用思维链微调的大语言模型从肺癌手术病理报告中进行自动病理TN分类预测及依据生成:算法开发与验证研究
JMIR Med Inform. 2024 Dec 20;12:e67056. doi: 10.2196/67056.
5
[Construction and characterization of single-framework fully synthetic nanobody libraries].[单框架全合成纳米抗体文库的构建与表征]
Sheng Wu Gong Cheng Xue Bao. 2025 Apr 25;41(4):1500-1514. doi: 10.13345/j.cjb.240940.
6
Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.利用大语言模型进行化疗诱导毒性的精准监测:一项专家比较及未来方向的试点研究
Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.
7
GPT-Driven Radiology Report Generation with Fine-Tuned Llama 3.基于微调的Llama 3由GPT驱动的放射学报告生成
Bioengineering (Basel). 2024 Oct 18;11(10):1043. doi: 10.3390/bioengineering11101043.
8
EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model.用于患者咨询和医学教育的EyeGPT:一种眼科大语言模型的开发与验证
J Med Internet Res. 2024 Dec 11;26:e60063. doi: 10.2196/60063.
9
Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis.全球医学考试中的大语言模型:平台开发与综合分析
J Med Internet Res. 2024 Dec 27;26:e66114. doi: 10.2196/66114.
10
Construction and validation of a synthetic phage-displayed nanobody library.合成噬菌体展示纳米抗体文库的构建与验证
Korean J Physiol Pharmacol. 2024 Sep 1;28(5):457-467. doi: 10.4196/kjpp.2024.28.5.457.

本文引用的文献

1
SYNBIP 2.0: epitopes mapping, sequence expansion and scaffolds discovery for synthetic binding protein innovation.SYNBIP 2.0:用于合成结合蛋白创新的表位映射、序列扩展和支架发现
Nucleic Acids Res. 2025 Jan 6;53(D1):D595-D603. doi: 10.1093/nar/gkae893.
2
Applications and challenges in designing VHH-based bispecific antibodies: leveraging machine learning solutions.基于 VHH 的双特异性抗体的设计应用和挑战:利用机器学习解决方案。
MAbs. 2024 Jan-Dec;16(1):2341443. doi: 10.1080/19420862.2024.2341443. Epub 2024 Apr 26.
3
IgLM: Infilling language modeling for antibody sequence design.
IgLM:抗体序列设计的填充语言模型。
Cell Syst. 2023 Nov 15;14(11):979-989.e4. doi: 10.1016/j.cels.2023.10.001. Epub 2023 Oct 30.
4
ImmuneBuilder: Deep-Learning models for predicting the structures of immune proteins.免疫构建体:用于预测免疫蛋白结构的深度学习模型。
Commun Biol. 2023 May 29;6(1):575. doi: 10.1038/s42003-023-04927-7.
5
Fast and accurate protein structure search with Foldseek.使用 Foldseek 进行快速准确的蛋白质结构搜索。
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
6
AbLang: an antibody language model for completing antibody sequences.AbLang:一种用于完成抗体序列的抗体语言模型。
Bioinform Adv. 2022 Jun 17;2(1):vbac046. doi: 10.1093/bioadv/vbac046. eCollection 2022.
7
Structural Insights into the Design of Synthetic Nanobody Libraries.结构洞察合成纳米抗体文库的设计。
Molecules. 2022 Mar 28;27(7):2198. doi: 10.3390/molecules27072198.
8
SAbDab in the age of biotherapeutics: updates including SAbDab-nano, the nanobody structure tracker.在生物治疗时代的 SAbDab:更新内容包括 SAbDab-nano,纳米体结构追踪器。
Nucleic Acids Res. 2022 Jan 7;50(D1):D1368-D1372. doi: 10.1093/nar/gkab1050.
9
Deep generative modeling for protein design.用于蛋白质设计的深度生成模型
Curr Opin Struct Biol. 2022 Feb;72:226-236. doi: 10.1016/j.sbi.2021.11.008. Epub 2021 Dec 25.
10
AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models.AlphaFold 蛋白质结构数据库:用高精度模型极大地扩展蛋白质序列空间的结构覆盖范围。
Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444. doi: 10.1093/nar/gkab1061.