• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

计算生物学中的机器学习以加速高通量蛋白质表达。

Machine learning in computational biology to accelerate high-throughput protein expression.

作者信息

Sastry Anand, Monk Jonathan, Tegel Hanna, Uhlen Mathias, Palsson Bernhard O, Rockberg Johan, Brunk Elizabeth

机构信息

Department of Bioengineering, University of California, San Diego, CA, USA.

KTH - Royal Institute of Technology, Department of Proteomics and Nanobiotechnology, SE-106 91 Stockholm, Sweden.

出版信息

Bioinformatics. 2017 Aug 15;33(16):2487-2495. doi: 10.1093/bioinformatics/btx207.

DOI:10.1093/bioinformatics/btx207
PMID:28398465
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5870730/
Abstract

MOTIVATION

The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility.

RESULTS

Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation.

AVAILABILITY AND IMPLEMENTATION

We present the machine learning workflow as a series of IPython notebooks hosted on GitHub (https://github.com/SBRG/Protein_ML). The workflow can be used as a template for analysis of further expression and solubility datasets.

CONTACT

ebrunk@ucsd.edu or johanr@biotech.kth.se.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

人类蛋白质图谱(HPA)能够同时对多种组织中的数千种蛋白质进行表征,以确定它们在人体中的空间位置。这是通过转录组学和基于高通量免疫组织化学的方法实现的,其中超过40000个独特的人类蛋白质片段已在大肠杆菌中表达。这些数据集能够对整个细胞蛋白质组进行定量跟踪,并为理解影响表达和溶解性的分子水平特性提供了新途径。

结果

结合计算生物学和机器学习可识别阻碍HPA高通量抗体生产流程的蛋白质特性。基于一组关键特性(芳香性、亲水性和等电点),我们分别以70%和80%的准确率预测蛋白质表达和溶解性。我们根据这些特性指导蛋白质片段的选择,以优化高通量实验。

可用性和实现方式

我们将机器学习工作流程呈现为一系列托管在GitHub(https://github.com/SBRG/Protein_ML)上的IPython笔记本。该工作流程可作为分析更多表达和溶解性数据集的模板。

联系方式

ebrunk@ucsd.edu或johanr@biotech.kth.se。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
Machine learning in computational biology to accelerate high-throughput protein expression.计算生物学中的机器学习以加速高通量蛋白质表达。
Bioinformatics. 2017 Aug 15;33(16):2487-2495. doi: 10.1093/bioinformatics/btx207.
2
Develop machine learning-based regression predictive models for engineering protein solubility.开发基于机器学习的回归预测模型,用于工程蛋白质溶解度。
Bioinformatics. 2019 Nov 1;35(22):4640-4646. doi: 10.1093/bioinformatics/btz294.
3
PyMethylProcess-convenient high-throughput preprocessing workflow for DNA methylation data.PyMethylProcess-适用于 DNA 甲基化数据的高通量预处理工作流。
Bioinformatics. 2019 Dec 15;35(24):5379-5381. doi: 10.1093/bioinformatics/btz594.
4
Scaling tree-based automated machine learning to biomedical big data with a feature set selector.使用特征集选择器将基于树的自动化机器学习扩展到生物医学大数据。
Bioinformatics. 2020 Jan 1;36(1):250-256. doi: 10.1093/bioinformatics/btz470.
5
SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.SPINE:一种用于在高通量结构蛋白质组学中识别可行靶点的集成跟踪数据库和数据挖掘方法。
Nucleic Acids Res. 2001 Jul 1;29(13):2884-98. doi: 10.1093/nar/29.13.2884.
6
PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine.PaRSnIP:基于梯度提升机的序列基蛋白质溶解性预测。
Bioinformatics. 2018 Apr 1;34(7):1092-1098. doi: 10.1093/bioinformatics/btx662.
7
ssbio: a Python framework for structural systems biology.ssbio:一个用于结构系统生物学的 Python 框架。
Bioinformatics. 2018 Jun 15;34(12):2155-2157. doi: 10.1093/bioinformatics/bty077.
8
DNCON2: improved protein contact prediction using two-level deep convolutional neural networks.DNCON2:使用两级深度卷积神经网络改进蛋白质接触预测。
Bioinformatics. 2018 May 1;34(9):1466-1472. doi: 10.1093/bioinformatics/btx781.
9
Extreme learning machines for reverse engineering of gene regulatory networks from expression time series.从表达时间序列中反向工程基因调控网络的极限学习机。
Bioinformatics. 2018 Apr 1;34(7):1253-1260. doi: 10.1093/bioinformatics/btx730.
10
NetSolP: predicting protein solubility in Escherichia coli using language models.NetSolP:使用语言模型预测大肠杆菌中的蛋白质可溶性。
Bioinformatics. 2022 Jan 27;38(4):941-946. doi: 10.1093/bioinformatics/btab801.

引用本文的文献

1
Machine learning modeling for solubility prediction of recombinant antibody fragment in four different E. coli strains.基于机器学习的四种不同大肠杆菌菌株中重组抗体片段溶解度预测模型
Sci Rep. 2022 Mar 31;12(1):5463. doi: 10.1038/s41598-022-09500-6.
2
Machine and Deep Learning for Prediction of Subcellular Localization.机器和深度学习在预测亚细胞定位中的应用。
Methods Mol Biol. 2021;2361:249-261. doi: 10.1007/978-1-0716-1641-3_15.

本文引用的文献

1
Multi-omic data integration enables discovery of hidden biological regularities.多组学数据整合能够发现隐藏的生物学规律。
Nat Commun. 2016 Oct 26;7:13091. doi: 10.1038/ncomms13091.
2
Deep learning for computational biology.用于计算生物学的深度学习。
Mol Syst Biol. 2016 Jul 29;12(7):878. doi: 10.15252/msb.20156651.
3
Codon identity regulates mRNA stability and translation efficiency during the maternal-to-zygotic transition.密码子特性在母源-合子转变过程中调节mRNA稳定性和翻译效率。
EMBO J. 2016 Oct 4;35(19):2087-2103. doi: 10.15252/embj.201694699. Epub 2016 Jul 19.
4
Clarifying the Translational Pausing Landscape in Bacteria by Ribosome Profiling.通过核糖体谱分析阐明细菌中的翻译暂停情况
Cell Rep. 2016 Feb 2;14(4):686-694. doi: 10.1016/j.celrep.2015.12.073. Epub 2016 Jan 14.
5
Codon influence on protein expression in E. coli correlates with mRNA levels.密码子对大肠杆菌中蛋白质表达的影响与mRNA水平相关。
Nature. 2016 Jan 21;529(7586):358-363. doi: 10.1038/nature16509. Epub 2016 Jan 13.
6
Decoding the jargon of bottom-up metabolic systems biology.解读自下而上代谢系统生物学的术语
Bioessays. 2015 Jun;37(6):588-91. doi: 10.1002/bies.201400187. Epub 2015 Mar 11.
7
Solid-phase cloning for high-throughput assembly of single and multiple DNA parts.用于单DNA片段和多个DNA片段高通量组装的固相克隆
Nucleic Acids Res. 2015 Apr 20;43(7):e49. doi: 10.1093/nar/gkv036. Epub 2015 Jan 23.
8
Proteomics. Tissue-based map of the human proteome.蛋白质组学。人类蛋白质组组织图谱。
Science. 2015 Jan 23;347(6220):1260419. doi: 10.1126/science.1260419.
9
DISOPRED3: precise disordered region predictions with annotated protein-binding activity.DISOPRED3:具有注释蛋白质结合活性的精确无序区域预测
Bioinformatics. 2015 Mar 15;31(6):857-63. doi: 10.1093/bioinformatics/btu744. Epub 2014 Nov 12.
10
A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli.综述机器学习方法预测在大肠杆菌中过表达重组蛋白的溶解度。
BMC Bioinformatics. 2014 May 8;15:134. doi: 10.1186/1471-2105-15-134.