Corley Nathaniel, Mathis Simon, Krishna Rohith, Bauer Magnus S, Thompson Tuscan R, Ahern Woody, Kazman Maxwell W, Brent Rafael I, Didi Kieran, Kubaney Andrew, McHugh Lilian, Nagle Arnav, Favor Andrew, Kshirsagar Meghana, Sturmfels Pascal, Li Yanjing, Butcher Jasper, Qiang Bo, Schaaf Lars L, Mitra Raktim, Campbell Katelyn, Zhang Odin, Weissman Roni, Humphreys Ian R, Cong Qian, Funk Jonathan, Sonthalia Shreyash, Liò Pietro, Baker David, DiMaio Frank
Institute for Protein Design, University of Washington, Seattle, 98105, Washington, USA.
Department of Bioengineering, University of Washington, Seattle, 98105, Washington, USA.
bioRxiv. 2025 Aug 15:2025.08.14.670328. doi: 10.1101/2025.08.14.670328.
Deep learning methods trained on protein structure databases have revolutionized biomolecular structure prediction, but developing and training new models remains a considerable challenge. To facilitate the development of new models, we present AtomWorks: a broadly applicable data framework for developing state-of-the-art biomolecular foundation models spanning diverse tasks, including structure prediction, generative protein design, and fixed backbone sequence design. We use AtomWorks to train RosettaFold-3 (RF3), a structure prediction network capable of predicting arbitrary biomolecular complexes with an improved treatment of chirality that narrows the performance gap between closed-source AlphaFold3 (AF3) and existing open-source implementations. We expect that AtomWorks will accelerate the next generation of open-source biomolecular machine learning models and that RF3 will be broadly useful as a structure prediction tool. To this end, we release the AtomWorks framework (https://github.com/RosettaCommons/atomworks), together with curated training data, code and model weights for RF3 (https://github.com/RosettaCommons/modelforge) under a permissive BSD license.
在蛋白质结构数据库上训练的深度学习方法彻底改变了生物分子结构预测,但开发和训练新模型仍然是一项巨大的挑战。为了促进新模型的开发,我们推出了AtomWorks:一个广泛适用的数据框架,用于开发跨越各种任务的先进生物分子基础模型,包括结构预测、生成式蛋白质设计和固定骨架序列设计。我们使用AtomWorks训练了RosettaFold-3(RF3),这是一个结构预测网络,能够预测任意生物分子复合物,对手性的处理有所改进,缩小了闭源的AlphaFold3(AF3)与现有开源实现之间的性能差距。我们预计AtomWorks将加速下一代开源生物分子机器学习模型的发展,并且RF3作为一种结构预测工具将具有广泛的用途。为此,我们在宽松的BSD许可下发布了AtomWorks框架(https://github.com/RosettaCommons/atomworks),以及用于RF3的精心策划的训练数据、代码和模型权重(https://github.com/RosettaCommons/modelforge)。