Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141, Republic of Korea.
KAIST Institute for Artificial Intelligence, KAIST, Daejeon, 34141, Republic of Korea.
J Microbiol. 2020 Mar;58(3):227-234. doi: 10.1007/s12275-020-9516-6. Epub 2020 Feb 27.
Computational analysis of biological data is becoming increasingly important, especially in this era of big data. Computational analysis of biological data allows efficiently deriving biological insights for given data, and sometimes even counterintuitive ones that may challenge the existing knowledge. Among experimental researchers without any prior exposure to computer programming, computational analysis of biological data has often been considered to be a task reserved for computational biologists. However, thanks to the increasing availability of user-friendly computational resources, experimental researchers can now easily access computational resources, including a scientific computing environment and packages necessary for data analysis. In this regard, we here describe the process of accessing Jupyter Notebook, the most popular Python coding environment, to conduct computational biology. Python is currently a mainstream programming language for biology and biotechnology. In particular, Anaconda and Google Colaboratory are introduced as two representative options to easily launch Jupyter Notebook. Finally, a Python package COBRApy is demonstrated as an example to simulate 1) specific growth rate of Escherichia coli as well as compounds consumed or generated under a minimal medium with glucose as a sole carbon source, and 2) theoretical production yield of succinic acid, an industrially important chemical, using E. coli. This protocol should serve as a guide for further extended computational analyses of biological data for experimental researchers without computational background.
生物数据的计算分析变得越来越重要,尤其是在大数据时代。生物数据的计算分析可以有效地从给定的数据中得出生物学见解,有时甚至是具有挑战性的、可能挑战现有知识的见解。对于没有任何计算机编程经验的实验研究人员来说,生物数据的计算分析通常被认为是计算生物学家的专属任务。然而,由于越来越多的用户友好型计算资源的可用性,实验研究人员现在可以轻松访问计算资源,包括科学计算环境和数据分析所需的软件包。在这方面,我们描述了访问最流行的 Python 编码环境 Jupyter Notebook 来进行计算生物学的过程。Python 目前是生物学和生物技术的主流编程语言。特别介绍了 Anaconda 和 Google Colaboratory 这两个易于启动 Jupyter Notebook 的代表性选项。最后,以 Python 包 COBRApy 为例,演示了 1)在以葡萄糖为唯一碳源的最小培养基中,大肠杆菌的特定生长速率以及消耗或产生的化合物,以及 2)利用大肠杆菌生产工业上重要的化学物质琥珀酸的理论产率。本方案应为没有计算背景的实验研究人员对生物数据进行进一步扩展计算分析提供指导。