• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用大语言模型优化蛋白质纯化:来自与蛋白质数据库相关的蛋白质结构文献的见解。

Using Large Language Model to Optimize Protein Purification: Insights from Protein Structure Literature Associated with Protein Data Bank.

作者信息

Chen Zhuojian, Sivaraman J

机构信息

Department of Biological Sciences, National University of Singapore, 14 Science Drive 4, Singapore, 117543, Singapore.

出版信息

Adv Sci (Weinh). 2025 Apr;12(15):e2413689. doi: 10.1002/advs.202413689. Epub 2025 Feb 20.

DOI:10.1002/advs.202413689
PMID:39976229
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12005808/
Abstract

Obtaining pure and homogeneous protein samples is vital for protein biology studies, yet optimizing protein expression and purification methods can be time-consuming because of variations in factors like expression conditions, buffer components, and fusion tags. With over 81 000 Protein Data Bank (PDB)-associated articles as of October 2024, manual extraction of relevant methods is impractical. To streamline this process, an automated tool is developed by incorporating a large language model (LLM) to extract and classify key data from these articles. The information extraction accuracy is enhanced by a 2-step-LLM and a 3-step-prompt. The key findings include: 1) Tris buffer is used in 49.2% of cases, followed by 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) and phosphate buffers. 2) Polyhistidine tags dominate at 82.5%, followed by glutathione S-transferase (GST) and maltose-binding protein (MBP) tags. 3) E. coli expression is done at 16-20 °C, with induction period favoring 12-16 h (69.0%) over 3-6 h (14.3%). The statistical analyses highlight the correlation between protein properties and purification strategies. This tool is validated through two case studies: method bias for membrane protein purification, and crosslinker/detergent preferences for Cryo-Electron Microscopy sample preparation. These findings provide a valuable resource for designing protein expression and purification experiments.

摘要

获得纯净且均一的蛋白质样品对于蛋白质生物学研究至关重要,然而由于表达条件、缓冲液成分和融合标签等因素的差异,优化蛋白质表达和纯化方法可能会很耗时。截至2024年10月,蛋白质数据库(PDB)相关文章超过81000篇,手动提取相关方法是不切实际的。为了简化这一过程,通过整合大语言模型(LLM)开发了一种自动化工具,以从这些文章中提取和分类关键数据。通过两步大语言模型和三步提示提高了信息提取的准确性。主要发现包括:1)49.2%的情况使用Tris缓冲液,其次是4-(2-羟乙基)-1-哌嗪乙磺酸(HEPES)和磷酸盐缓冲液。2)多组氨酸标签占主导地位,为82.5%,其次是谷胱甘肽S-转移酶(GST)和麦芽糖结合蛋白(MBP)标签。3)大肠杆菌表达在16-20°C进行,诱导期以12-16小时(69.0%)优于3-6小时(14.3%)。统计分析突出了蛋白质性质与纯化策略之间的相关性。该工具通过两个案例研究得到验证:膜蛋白纯化的方法偏差,以及冷冻电子显微镜样品制备的交联剂/去污剂偏好。这些发现为设计蛋白质表达和纯化实验提供了宝贵的资源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/7af905fdd864/ADVS-12-2413689-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/52292dfb40c2/ADVS-12-2413689-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/c17075818c37/ADVS-12-2413689-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/c04f496b9952/ADVS-12-2413689-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/7cd094e4e1a4/ADVS-12-2413689-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/1facde5ee428/ADVS-12-2413689-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/dfbfe5458a23/ADVS-12-2413689-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/2fa964a7c62f/ADVS-12-2413689-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/7af905fdd864/ADVS-12-2413689-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/52292dfb40c2/ADVS-12-2413689-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/c17075818c37/ADVS-12-2413689-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/c04f496b9952/ADVS-12-2413689-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/7cd094e4e1a4/ADVS-12-2413689-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/1facde5ee428/ADVS-12-2413689-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/dfbfe5458a23/ADVS-12-2413689-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/2fa964a7c62f/ADVS-12-2413689-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/873a/12005808/7af905fdd864/ADVS-12-2413689-g003.jpg

相似文献

1
Using Large Language Model to Optimize Protein Purification: Insights from Protein Structure Literature Associated with Protein Data Bank.利用大语言模型优化蛋白质纯化:来自与蛋白质数据库相关的蛋白质结构文献的见解。
Adv Sci (Weinh). 2025 Apr;12(15):e2413689. doi: 10.1002/advs.202413689. Epub 2025 Feb 20.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Systematic analysis of the expression, solubility and purification of a passenger protein in fusion with different tags.对与不同标签融合的过客蛋白的表达、溶解性及纯化进行系统分析。
Protein Expr Purif. 2018 Dec;152:92-106. doi: 10.1016/j.pep.2018.07.007. Epub 2018 Jul 20.
4
Database Study on the Expression and Purification of Membrane Proteins.数据库研究膜蛋白的表达和纯化。
Protein Pept Lett. 2021;28(9):972-982. doi: 10.2174/0929866528666210415120234.
5
Production of Computationally Designed Small Soluble- and Membrane-Proteins: Cloning, Expression, and Purification.计算设计的小可溶性蛋白和膜蛋白的制备:克隆、表达与纯化
Methods Mol Biol. 2017;1529:95-106. doi: 10.1007/978-1-4939-6637-0_3.
6
Affinity purification of proteins binding to GST fusion proteins.与谷胱甘肽 S-转移酶(GST)融合蛋白结合的蛋白质的亲和纯化。
Curr Protoc Mol Biol. 2001 May;Chapter 20:Unit 20.2. doi: 10.1002/0471142727.mb2002s33.
7
Efficient protein production method for NMR using soluble protein tags with cold shock expression vector.利用带有冷休克表达载体的可溶性蛋白标签的高效蛋白生产方法用于 NMR。
J Biomol NMR. 2010 Nov;48(3):147-55. doi: 10.1007/s10858-010-9445-5. Epub 2010 Sep 16.
8
Affinity Purification of a Recombinant Protein Expressed as a Fusion with the Maltose-Binding Protein (MBP) Tag.以与麦芽糖结合蛋白(MBP)标签融合形式表达的重组蛋白的亲和纯化。
Methods Enzymol. 2015;559:17-26. doi: 10.1016/bs.mie.2014.11.004. Epub 2015 Apr 15.
9
High-Level Soluble Expression and One-step Purification of HTLV-I P19 Protein in Escherichia coli by Fusion Expression.通过融合表达在大肠杆菌中实现人嗜T淋巴细胞病毒I型P19蛋白的高效可溶性表达及一步纯化
Iran J Allergy Asthma Immunol. 2015 Dec;14(6):624-32.
10
Large-scale production, purification and refolding of the full-length cellular prion protein from Syrian golden hamster in Escherichia coli using the glutathione S-transferase-fusion system.利用谷胱甘肽S-转移酶融合系统在大肠杆菌中大规模生产、纯化和重折叠叙利亚金仓鼠的全长细胞朊蛋白。
Eur J Biochem. 1998 Jan 15;251(1-2):462-71. doi: 10.1046/j.1432-1327.1998.2510462.x.

本文引用的文献

1
UniProt: the Universal Protein Knowledgebase in 2025.通用蛋白质知识库(UniProt):2025年的情况
Nucleic Acids Res. 2025 Jan 6;53(D1):D609-D617. doi: 10.1093/nar/gkae1010.
2
Solubility-Weighted Index: fast and accurate prediction of protein solubility.溶解度加权指数:快速准确预测蛋白质溶解度。
Bioinformatics. 2020 Sep 15;36(18):4691-4698. doi: 10.1093/bioinformatics/btaa578.
3
Cellular and viral peptides bind multiple sites on the N-terminal domain of clathrin.细胞肽和病毒肽结合网格蛋白N端结构域上的多个位点。
Traffic. 2017 Jan;18(1):44-57. doi: 10.1111/tra.12457. Epub 2016 Dec 14.
4
Formaldehyde crosslinking: a tool for the study of chromatin complexes.甲醛交联:一种用于研究染色质复合物的工具。
J Biol Chem. 2015 Oct 30;290(44):26404-11. doi: 10.1074/jbc.R115.651679. Epub 2015 Sep 9.
5
Protein purification: an overview.蛋白质纯化:概述
Methods Mol Biol. 2014;1129:3-10. doi: 10.1007/978-1-62703-977-2_1.
6
Several affinity tags commonly used in chromatographic purification.几种常用于色谱纯化的亲和标签。
J Anal Methods Chem. 2013;2013:581093. doi: 10.1155/2013/581093. Epub 2013 Dec 26.
7
To fuse or not to fuse: what is your purpose?融合还是不融合:你的目的是什么?
Protein Sci. 2013 Nov;22(11):1466-77. doi: 10.1002/pro.2356. Epub 2013 Sep 17.
8
Optimization of protein purification and characterization using Thermofluor screens.使用热荧光筛选优化蛋白质纯化与表征
Protein Expr Purif. 2013 Oct;91(2):192-206. doi: 10.1016/j.pep.2013.08.002. Epub 2013 Aug 12.
9
A digest of protein purification.蛋白质纯化摘要。
Methods Mol Biol. 2011;681:3-23. doi: 10.1007/978-1-60761-913-0_1.
10
GraFix: stabilization of fragile macromolecular complexes for single particle cryo-EM.GraFix:用于单颗粒冷冻电镜的脆弱大分子复合物的稳定化
Methods Enzymol. 2010;481:109-26. doi: 10.1016/S0076-6879(10)81005-5.