建立用于计算生物学的科学计算环境：以大肠杆菌全基因组代谢模型模拟为例。

Setup of a scientific computing environment for computational biology: Simulation of a genome-scale metabolic model of Escherichia coli as an example.

机构信息

Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.

KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea.

出版信息

J Microbiol. 2020 Mar;58(3):227-234. doi: 10.1007/s12275-020-9516-6. Epub 2020 Feb 27.

DOI:10.1007/s12275-020-9516-6

PMID:32108317

Abstract

Computational analysis of biological data is becoming increasingly important, especially in this era of big data. Computational analysis of biological data allows efficiently deriving biological insights for given data, and sometimes even counterintuitive ones that may challenge the existing knowledge. Among experimental researchers without any prior exposure to computer programming, computational analysis of biological data has often been considered to be a task reserved for computational biologists. However, thanks to the increasing availability of user-friendly computational resources, experimental researchers can now easily access computational resources, including a scientific computing environment and packages necessary for data analysis. In this regard, we here describe the process of accessing Jupyter Notebook, the most popular Python coding environment, to conduct computational biology. Python is currently a mainstream programming language for biology and biotechnology. In particular, Anaconda and Google Colaboratory are introduced as two representative options to easily launch Jupyter Notebook. Finally, a Python package COBRApy is demonstrated as an example to simulate 1) specific growth rate of Escherichia coli as well as compounds consumed or generated under a minimal medium with glucose as a sole carbon source, and 2) theoretical production yield of succinic acid, an industrially important chemical, using E. coli. This protocol should serve as a guide for further extended computational analyses of biological data for experimental researchers without computational background.

摘要

生物数据的计算分析变得越来越重要，尤其是在大数据时代。生物数据的计算分析可以有效地从给定的数据中得出生物学见解，有时甚至是具有挑战性的、可能挑战现有知识的见解。对于没有任何计算机编程经验的实验研究人员来说，生物数据的计算分析通常被认为是计算生物学家的专属任务。然而，由于越来越多的用户友好型计算资源的可用性，实验研究人员现在可以轻松访问计算资源，包括科学计算环境和数据分析所需的软件包。在这方面，我们描述了访问最流行的 Python 编码环境 Jupyter Notebook 来进行计算生物学的过程。Python 目前是生物学和生物技术的主流编程语言。特别介绍了 Anaconda 和 Google Colaboratory 这两个易于启动 Jupyter Notebook 的代表性选项。最后，以 Python 包 COBRApy 为例，演示了 1）在以葡萄糖为唯一碳源的最小培养基中，大肠杆菌的特定生长速率以及消耗或产生的化合物，以及 2）利用大肠杆菌生产工业上重要的化学物质琥珀酸的理论产率。本方案应为没有计算背景的实验研究人员对生物数据进行进一步扩展计算分析提供指导。

相似文献

Setup of a scientific computing environment for computational biology: Simulation of a genome-scale metabolic model of Escherichia coli as an example.建立用于计算生物学的科学计算环境：以大肠杆菌全基因组代谢模型模拟为例。

J Microbiol. 2020 Mar;58(3):227-234. doi: 10.1007/s12275-020-9516-6. Epub 2020 Feb 27.

The JBEI quantitative metabolic modeling library (jQMM): a python library for modeling microbial metabolism.联合生物能源研究所定量代谢建模库（jQMM）：一个用于微生物代谢建模的Python库。

BMC Bioinformatics. 2017 Apr 5;18(1):205. doi: 10.1186/s12859-017-1615-y.

Integration of the Rosetta suite with the python software stack via reproducible packaging and core programming interfaces for distributed simulation.通过可重现的打包和核心编程接口将 Rosetta 套件与 Python 软件堆栈集成，用于分布式模拟。

Protein Sci. 2020 Jan;29(1):43-51. doi: 10.1002/pro.3721. Epub 2019 Dec 2.

An Introduction to Programming for Bioscientists: A Python-Based Primer.生物科学家编程入门：基于Python的基础教程。

PLoS Comput Biol. 2016 Jun 7;12(6):e1004867. doi: 10.1371/journal.pcbi.1004867. eCollection 2016 Jun.

Visualizing protein big data using Python and Jupyter notebooks.使用 Python 和 Jupyter 笔记本可视化蛋白质大数据。

Biochem Mol Biol Educ. 2022 Sep;50(5):431-436. doi: 10.1002/bmb.21621. Epub 2022 Apr 11.

First steps into the cloud: Using Amazon data storage and computing with Python notebooks.初探云端：使用 Python 笔记本使用亚马逊数据存储和计算。

PLoS One. 2023 Feb 9;18(2):e0278316. doi: 10.1371/journal.pone.0278316. eCollection 2023.

Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing.迈向代谢组学中基于 Jupyter Notebooks 和云计算的协作式开放数据科学。

Metabolomics. 2019 Sep 14;15(10):125. doi: 10.1007/s11306-019-1588-0.

COBRApy: COnstraints-Based Reconstruction and Analysis for Python.COBRApy：用于Python的基于约束的重建与分析

BMC Syst Biol. 2013 Aug 8;7:74. doi: 10.1186/1752-0509-7-74.

Genome-Scale C Fluxomics Modeling for Metabolic Engineering of Saccharomyces cerevisiae.用于酿酒酵母代谢工程的基因组尺度碳通量组学建模

Methods Mol Biol. 2019;1859:317-345. doi: 10.1007/978-1-4939-8757-3_19.

Genome-scale in silico aided metabolic analysis and flux comparisons of Escherichia coli to improve succinate production.基于计算机辅助的大肠杆菌全基因组规模代谢分析及通量比较以提高琥珀酸产量

Appl Microbiol Biotechnol. 2006 Dec;73(4):887-94. doi: 10.1007/s00253-006-0535-y. Epub 2006 Aug 23.

引用本文的文献

Genome-Scale Metabolic Modeling Enables In-Depth Understanding of Big Data.基因组尺度代谢建模助力深入理解大数据。

Metabolites. 2021 Dec 24;12(1):14. doi: 10.3390/metabo12010014.

Omics-based microbiome analysis in microbial ecology: from sequences to information.基于组学的微生物组学分析在微生物生态学中的应用：从序列到信息。

J Microbiol. 2021 Mar;59(3):229-232. doi: 10.1007/s12275-021-0698-3.

本文引用的文献

Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers.深度学习可实现酶委员会编号的高质量和高通量预测。

Proc Natl Acad Sci U S A. 2019 Jul 9;116(28):13996-14001. doi: 10.1073/pnas.1821905116. Epub 2019 Jun 20.

Current status and applications of genome-scale metabolic models.基因组规模代谢模型的现状与应用。

Genome Biol. 2019 Jun 13;20(1):121. doi: 10.1186/s13059-019-1730-3.

Introducing Programming Skills for Life Science Students.为生命科学专业学生介绍编程技能。

Biochem Mol Biol Educ. 2019 May;47(3):288-295. doi: 10.1002/bmb.21230. Epub 2019 Mar 12.

Why Jupyter is data scientists' computational notebook of choice.为何Jupyter是数据科学家首选的计算笔记本。

Nature. 2018 Nov;563(7729):145-146. doi: 10.1038/d41586-018-07196-1.

Cameo: A Python Library for Computer Aided Metabolic Engineering and Optimization of Cell Factories.Cameo：一个用于计算机辅助代谢工程和细胞工厂优化的Python库。

ACS Synth Biol. 2018 Apr 20;7(4):1163-1166. doi: 10.1021/acssynbio.7b00423. Epub 2018 Apr 4.

iML1515, a knowledgebase that computes Escherichia coli traits.iML1515，一个用于计算大肠杆菌特性的知识库。

Nat Biotechnol. 2017 Oct 11;35(10):904-908. doi: 10.1038/nbt.3956.

BiGG Models: A platform for integrating, standardizing and sharing genome-scale models.BiGG模型：一个用于整合、标准化和共享基因组规模模型的平台。

Nucleic Acids Res. 2016 Jan 4;44(D1):D515-22. doi: 10.1093/nar/gkv1049. Epub 2015 Oct 17.

COBRApy: COnstraints-Based Reconstruction and Analysis for Python.COBRApy：用于Python的基于约束的重建与分析

BMC Syst Biol. 2013 Aug 8;7:74. doi: 10.1186/1752-0509-7-74.

DendroPy: a Python library for phylogenetic computing.DendroPy：一个用于系统发育计算的 Python 库。

Bioinformatics. 2010 Jun 15;26(12):1569-71. doi: 10.1093/bioinformatics/btq228. Epub 2010 Apr 25.

In silico identification of gene amplification targets for improvement of lycopene production.通过计算机筛选确定基因扩增靶点，提高番茄红素产量。

Appl Environ Microbiol. 2010 May;76(10):3097-105. doi: 10.1128/AEM.00115-10. Epub 2010 Mar 26.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

建立用于计算生物学的科学计算环境：以大肠杆菌全基因组代谢模型模拟为例。

Setup of a scientific computing environment for computational biology: Simulation of a genome-scale metabolic model of Escherichia coli as an example.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献