Cui Xinyue, Xia Yuhao, Hou Minghua, Zhao Xuanfeng, Wang Suhui, Zhang Guijun
College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China.
BMC Bioinformatics. 2025 May 5;26(1):120. doi: 10.1186/s12859-025-06131-2.
Association and cooperation among structural domains play an important role in protein function and drug design. Despite remarkable advancements in highly accurate single-domain protein structure prediction through the collaborative efforts of the community using deep learning, challenges still exist in predicting multi-domain protein structures when the evolutionary signal for a given domain pair is weak or the protein structure is large.
To alleviate the above challenges, we proposed M-DeepAssembly, a protocol based on multi-objective protein conformation sampling algorithm for multi-domain protein structure prediction. Firstly, the inter-domain interactions and full-length sequence distance features are extracted through DeepAssembly and AlphaFold2, respectively. Secondly, subject to these features, we constructed a multi-objective energy model and designed a sampling algorithm for exploring and exploiting conformational space to generate ensembles. Finally, the output protein structure was selected from the ensembles using our in-house developed model quality assessment algorithm. On the test set of 164 multi-domain proteins, the results show that the average TM-score of M-DeepAssembly is 15.4% and 2.0% higher than AlphaFold2 and DeepAssembly, respectively. It is worth noting that there are models with higher accuracy in ensembles, achieving an improvement of 20.3% and 6.4% relative to the two baseline methods, although these models were not selected. Furthermore, when compared to the prediction results of AlphaFold2 for CASP15 multi-domain targets, M-DeepAssembly demonstrates certain performance advantages.
M-DeepAssembly provides a distinctive multi-domain protein assembly algorithm, which can alleviate the current challenges of weak evolutionary signals and large structures to some extent by forming diverse ensembles using multi-objective protein conformation sampling algorithm. The proposed method contributes to exploring the functions of multi-domain proteins, especially providing new insights into targets with multiple conformational states.
结构域之间的关联与协作在蛋白质功能及药物设计中发挥着重要作用。尽管通过深度学习,社区共同努力在高精度单结构域蛋白质结构预测方面取得了显著进展,但在预测多结构域蛋白质结构时,当给定结构域对的进化信号较弱或蛋白质结构较大时,仍存在挑战。
为缓解上述挑战,我们提出了M-DeepAssembly,一种基于多目标蛋白质构象采样算法的多结构域蛋白质结构预测方案。首先,分别通过DeepAssembly和AlphaFold2提取结构域间相互作用和全长序列距离特征。其次,基于这些特征构建多目标能量模型,并设计一种采样算法来探索和利用构象空间以生成集合。最后,使用我们内部开发的模型质量评估算法从集合中选择输出蛋白质结构。在164个多结构域蛋白质的测试集上,结果表明M-DeepAssembly的平均TM分数分别比AlphaFold2和DeepAssembly高15.4%和2.0%。值得注意的是,集合中有准确率更高的模型,尽管这些模型未被选中,但相对于两种基线方法分别提高了20.3%和6.4%。此外,与AlphaFold2对CASP15多结构域靶点的预测结果相比,M-DeepAssembly表现出一定的性能优势。
M-DeepAssembly提供了一种独特的多结构域蛋白质组装算法,通过使用多目标蛋白质构象采样算法形成多样化的集合,可在一定程度上缓解当前进化信号弱和结构大的挑战。所提出的方法有助于探索多结构域蛋白质的功能,特别是为具有多种构象状态的靶点提供新的见解。