Wang Xudong, Zhang Tong, Liu Guangbu, Cui Zhen, Zeng Zhiyong, Long Cheng, Zheng Wenming, Yang Jian
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
School of Automation, Nanjing University of Science and Technology, Nanjing, 210094, China.
Adv Sci (Weinh). 2025 May;12(19):e2309051. doi: 10.1002/advs.202309051. Epub 2025 Mar 25.
Accurately predicting protein structure, from sequences to 3D structures, is of great significance in biological research. To tackle this issue, a representative deep big model, RoseTTAFold, is proposed with promising success. Here, "a light-weight deep graph network, named LightRoseTTA," is reported to achieve accurate and highly efficient prediction for proteins. Notably, three highlights are possessed by LightRoseTTA: i) high-accurate structure prediction for proteins, being "competitive with RoseTTAFold" on multiple popular datasets including CASP14 and CAMEO; ii) high-efficient training and inference with a light-weight model, costing "only 1 week on one single NVIDIA 3090 GPU for model-training" (vs 30 days on 8 NVIDIA V100 GPUs for RoseTTAFold) and containing "only 1.4M parameters" (vs 130M in RoseTTAFold); iii) low dependency on multi-sequence alignment (MSA), achieving the best performance on three MSA-insufficient datasets: Orphan, De novo, and Orphan25. Besides, LightRoseTTA is "transferable" from general proteins to antibody data, as verified in the experiments. The time and resource costs of LightRoseTTA and RoseTTAFold are further discussed to demonstrate the feasibility of light-weight models for protein structure prediction, which may be crucial in resource-limited research for universities and academic institutions. The code and model are released to speed biological research (https://github.com/psp3dcg/LightRoseTTA).
从序列到三维结构准确预测蛋白质结构在生物学研究中具有重要意义。为解决这一问题,提出了一个具有代表性的深度大型模型RoseTTAFold,并取得了令人瞩目的成功。在此,报道了一种名为LightRoseTTA的轻量级深度图网络,它能实现对蛋白质的准确且高效的预测。值得注意的是,LightRoseTTA具有三个亮点:i)对蛋白质进行高精度结构预测,在包括CASP14和CAMEO在内的多个流行数据集上“与RoseTTAFold具有竞争力”;ii)使用轻量级模型进行高效训练和推理,模型训练“在一块NVIDIA 3090 GPU上仅需1周时间”(相比之下,RoseTTAFold在8块NVIDIA V100 GPU上需要30天),且“仅包含140万个参数”(相比之下,RoseTTAFold有1.3亿个参数);iii)对多序列比对(MSA)的依赖性低,在三个MSA不足的数据集(孤儿、从头开始和孤儿25)上表现最佳。此外,实验证明LightRoseTTA可从普通蛋白质“转移”到抗体数据。进一步讨论了LightRoseTTA和RoseTTAFold的时间和资源成本,以证明轻量级模型用于蛋白质结构预测的可行性,这在大学和学术机构资源有限的研究中可能至关重要。代码和模型已发布,以加速生物学研究(https://github.com/psp3dcg/LightRoseTTA)。