Savojardo Castrense, Manfredi Matteo, Martelli Pier Luigi, Casadio Rita
Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Via San Giacomo 9/2, Bologna, 40126, Italy.
The Alma Climate Institute, Interdepartmental Center, University of Bologna, Bologna, 40100, Italy.
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btaf019.
The knowledge of protein stability upon residue variation is an important step for functional protein design and for understanding how protein variants can promote disease onset. Computational methods are important to complement experimental approaches and allow a fast screening of large datasets of variations.
In this work, we present DDGemb, a novel method combining protein language model embeddings and transformer architectures to predict protein ΔΔG upon both single- and multi-point variations. DDGemb has been trained on a high-quality dataset derived from literature and tested on available benchmark datasets of single- and multi-point variations. DDGemb performs at the state of the art in both single- and multi-point variations.
DDGemb is available as web server at https://ddgemb.biocomp.unibo.it. Datasets used in this study are available at https://ddgemb.biocomp.unibo.it/datasets.
了解残基变异时蛋白质的稳定性是功能性蛋白质设计以及理解蛋白质变体如何引发疾病的重要一步。计算方法对于补充实验方法以及快速筛选大量变异数据集非常重要。
在这项工作中,我们提出了DDGemb,这是一种结合蛋白质语言模型嵌入和变压器架构来预测单点和多点变异时蛋白质ΔΔG的新方法。DDGemb已在源自文献的高质量数据集上进行训练,并在可用的单点和多点变异基准数据集上进行测试。DDGemb在单点和多点变异方面均达到了当前的先进水平。
DDGemb可作为网络服务器在https://ddgemb.biocomp.unibo.it上获取。本研究中使用的数据集可在https://ddgemb.biocomp.unibo.it/datasets上获取。