用于高通量计算的蛋白质结构预测与设计

Protein structure prediction and design for high-throughput computing.

作者信息

Mathew Vinay Saji, Kellogg Gretta D, Lai William Km

机构信息

Department of Industrial Engineering, Pennsylvania State University, University Park, PA 16802, USA.

Institute for Computational and Data Sciences, Pennsylvania State University, University Park, PA 16802, USA.

出版信息

bioRxiv. 2025 Jul 22:2025.07.18.665594. doi: 10.1101/2025.07.18.665594.

Abstract

Recent advances in structural biology and machines learning have resulted in a revolution in molecular biology. This revolution is driven by protein structure prediction and design tools such as Alphafold3, Chai-1, and Boltz-2 which are now able to accurately model protein structures as well as predict protein-complex formation with a variety of substrates at atomic resolution (i.e., DNA, RNA, small ligands, post-translational modifications). The impact of these protein-structure prediction algorithms has been matched by the emergence of protein design platforms (RFdiffusion), which now promise to revolutionize synthetic biology and novel disease therapeutics. Despite their potential to transform molecular biology, the adoption of these algorithms is hindered in part, not only by their high computational requirements, but also by the difficulty in deploying these algorithms on available systems. To help address these barriers, we developed containerized solutions for AlphaFold3, Chai-1, Boltz-2, and RFdiffusion, optimized across a variety of computational architectures (e.g., x86 and ARM). Additionally, we present OmniFold, an optimized wrapper-platform with automatic QC report generation that enables AlphaFold3, Chai-1, and Boltz-2 to perform simultaneously while more efficiently utilizing GPU systems. Precompiled containers and their definition files are available as open source through Sylabs and GitHub. We hope that these containers and repos will help to facilitate reproducibility, accessibility, and accelerate scientific discovery.

摘要

结构生物学和机器学习的最新进展引发了分子生物学的一场革命。这场革命由诸如Alphafold3、Chai-1和Boltz-2等蛋白质结构预测和设计工具推动,这些工具现在能够以原子分辨率(即DNA、RNA、小分子配体、翻译后修饰)准确地模拟蛋白质结构,并预测与各种底物的蛋白质复合物形成。这些蛋白质结构预测算法的影响与蛋白质设计平台(RFdiffusion)的出现相匹配,该平台现在有望彻底改变合成生物学和新型疾病治疗方法。尽管它们有潜力改变分子生物学,但这些算法的采用在一定程度上受到阻碍,不仅因为它们对计算要求很高,还因为在现有系统上部署这些算法存在困难。为了帮助克服这些障碍,我们为AlphaFold3、Chai-1、Boltz-2和RFdiffusion开发了容器化解决方案,并在各种计算架构(如x86和ARM)上进行了优化。此外,我们还展示了OmniFold,这是一个优化的包装平台,具有自动质量控制报告生成功能,能够使AlphaFold3、Chai-1和Boltz-2同时运行,同时更高效地利用GPU系统。预编译的容器及其定义文件可通过Sylabs和GitHub作为开源提供。我们希望这些容器和存储库将有助于促进可重复性、可访问性,并加速科学发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a96/12330748/8b61686e7153/nihpp-2025.07.18.665594v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索