Kotowski Krzysztof, Roterman Irena, Stapor Katarzyna
Department of Applied Informatics, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland.
Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Medyczna 7, 30-688, Kraków, Poland.
Comput Biol Med. 2025 Feb;185:109586. doi: 10.1016/j.compbiomed.2024.109586. Epub 2024 Dec 20.
The prediction of intrinsic disorder regions has significant implications for understanding protein functions and dynamics. It can help to discover novel protein-protein interactions essential for designing new drugs and enzymes. Recently, a new generation of predictors based on protein language models (pLMs) is emerging. These algorithms reach state-of-the-art accuracy without calculating time-consuming multiple sequence alignments (MSAs). This article introduces the new DisorderUnetLM disorder predictor, which builds upon the idea of ProteinUnet. It uses the Attention U-Net convolutional network and incorporates features from the ProtTrans pLM. DisorderUnetLM achieves top results in the direct comparison with recent predictors exploiting MSAs and pLMs. Moreover, among 43 predictors on the latest CAID-2 benchmark, it ranks 1st for the NOX subset in terms of the ROC-AUC metric (0.844) and 2nd for the AP metric (0.596). For the CAID-2 PDB subset, it ranks in the top 10 (ROC-AUC of 0.924 and AP of 0.862). The code and model are publicly available and fully reproducible at doi.org/10.24433/CO.7350682.v1.
预测内在无序区域对于理解蛋白质功能和动力学具有重要意义。它有助于发现对设计新药和酶至关重要的新型蛋白质-蛋白质相互作用。最近,基于蛋白质语言模型(pLMs)的新一代预测器正在兴起。这些算法无需计算耗时的多序列比对(MSA)就能达到当前的最高准确率。本文介绍了新的DisorderUnetLM无序预测器,它基于ProteinUnet的理念构建。它使用注意力U-Net卷积网络并整合了ProtTrans pLM的特征。在与利用MSA和pLMs的近期预测器的直接比较中,DisorderUnetLM取得了顶尖的结果。此外,在最新的CAID-2基准测试中的43个预测器中,就ROC-AUC指标而言,它在NOX子集中排名第一(0.844),就AP指标而言排名第二(0.596)。对于CAID-2 PDB子集,它排名前十(ROC-AUC为0.924,AP为0.862)。代码和模型可在doi.org/10.24433/CO.7350682.v1上公开获取且完全可重现。