Institute for Protein Innovation, Boston, Massachusetts.
Institute for Protein Design, University of Washington, Seattle, Washington.
Protein Sci. 2020 Jan;29(1):43-51. doi: 10.1002/pro.3721. Epub 2019 Dec 2.
The Rosetta software suite for macromolecular modeling is a powerful computational toolbox for protein design, structure prediction, and protein structure analysis. The development of novel Rosetta-based scientific tools requires two orthogonal skill sets: deep domain-specific expertise in protein biochemistry and technical expertise in development, deployment, and analysis of molecular simulations. Furthermore, the computational demands of molecular simulation necessitate large scale cluster-based or distributed solutions for nearly all scientifically relevant tasks. To reduce the technical barriers to entry for new development, we integrated Rosetta with modern, widely adopted computational infrastructure. This allows simplified deployment in large-scale cluster and cloud computing environments, and effective reuse of common libraries for simulation execution and data analysis. To achieve this, we integrated Rosetta with the Conda package manager; this simplifies installation into existing computational environments and packaging as docker images for cloud deployment. Then, we developed programming interfaces to integrate Rosetta with the PyData stack for analysis and distributed computing, including the popular tools Jupyter, Pandas, and Dask. We demonstrate the utility of these components by generating a library of a thousand de novo disulfide-rich miniproteins in a hybrid simulation that included cluster-based design and interactive notebook-based analyses. Our new tools enable users, who would otherwise not have access to the necessary computational infrastructure, to perform state-of-the-art molecular simulation and design with Rosetta.
用于大分子建模的 Rosetta 软件套件是蛋白质设计、结构预测和蛋白质结构分析的强大计算工具包。开发新的基于 Rosetta 的科学工具需要两种正交的技能集:蛋白质生物化学的深入领域专业知识和分子模拟的开发、部署和分析技术专长。此外,分子模拟的计算需求几乎需要所有科学相关任务的大规模集群或分布式解决方案。为了降低新开发的技术障碍,我们将 Rosetta 与现代、广泛采用的计算基础设施集成在一起。这允许在大规模集群和云计算环境中简化部署,并有效地重复使用用于模拟执行和数据分析的通用库。为此,我们将 Rosetta 与 Conda 包管理器集成在一起;这简化了在现有计算环境中的安装,并将其打包为用于云部署的 docker 映像。然后,我们开发了编程接口,将 Rosetta 与用于分析和分布式计算的 PyData 堆栈集成在一起,包括流行的工具 Jupyter、Pandas 和 Dask。我们通过在包括基于集群的设计和交互式笔记本分析的混合模拟中生成一千个新的富含二硫键的迷你蛋白库来展示这些组件的实用性。我们的新工具使没有必要计算基础设施的用户能够使用 Rosetta 进行最先进的分子模拟和设计。