Suppr超能文献

蛋白质-RNA结合位点预测的不确定性量化与温度尺度校准

Uncertainty Quantification and Temperature Scaling Calibration for Protein-RNA Binding Site Prediction.

作者信息

Zeng Ximin, Wang Hongmei, Zhao Long, Cheng Yue, Zhou Danping, Shi Shaoping

机构信息

Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China.

Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang 330031, China.

出版信息

J Chem Inf Model. 2025 Jun 23;65(12):6310-6321. doi: 10.1021/acs.jcim.5c00556. Epub 2025 Jun 2.

Abstract

The black-box nature of deep learning has increasingly drawn attention to the reliability and uncertainty of predictive models. Currently, several uncertainty quantification (UQ) methods have been proposed and successfully applied in the fields of molecules and proteins, effectively improving model prediction quality and interpretability. Protein-RNA binding represents a fundamental aspect of protein research. Accurate prediction of binding sites and ensuring the reliability of such predictions are crucial for various scientific endeavors. However, many of the existing computational methods have a single feature extraction and lack of UQ. To address these, we propose MGCA (multiscale graph convolutional networks, convolutional neural networks and attention) to better capture local and global information and achieve competitive results in predicting protein-RNA binding sites. Moreover, we launch a UQ study based on MGCA and five prevalent models to verify the robustness of the results. Specifically, we introduce the Expected Calibration Error (ECE) to assess the uncertainty of the models. Additionally, a novel split-bins screening method is proposed based on the ECE, aiming to investigate the practical impact of reducing uncertainty on the models. Finally, temperature scaling (TS) is used to calibrate model uncertainty without changing performance. Results show that the split-bins screening method reduces false positives (FP), and TS significantly decreases the model ECE. The split-bins screening method combined with TS can further reduce FP and improve precision. Our findings demonstrate that TS effectively reduces uncertainty in protein-RNA binding site prediction, and minimizing model uncertainty enhances prediction quality. The data and code can be available at https://github.com/trustcm/UQ-TS-Split-bins-RBP.

摘要

深度学习的黑箱性质日益引起人们对预测模型可靠性和不确定性的关注。目前,已经提出了几种不确定性量化(UQ)方法,并成功应用于分子和蛋白质领域,有效提高了模型预测质量和可解释性。蛋白质与RNA的结合是蛋白质研究的一个基本方面。准确预测结合位点并确保此类预测的可靠性对于各种科学研究至关重要。然而,许多现有的计算方法具有单一特征提取且缺乏不确定性量化。为了解决这些问题,我们提出了MGCA(多尺度图卷积网络、卷积神经网络和注意力机制),以更好地捕捉局部和全局信息,并在预测蛋白质-RNA结合位点方面取得了有竞争力的结果。此外,我们基于MGCA和五个流行模型开展了不确定性量化研究,以验证结果的稳健性。具体而言,我们引入预期校准误差(ECE)来评估模型的不确定性。此外,基于ECE提出了一种新颖的分箱筛选方法,旨在研究降低不确定性对模型的实际影响。最后,使用温度缩放(TS)来校准模型不确定性而不改变性能。结果表明,分箱筛选方法减少了误报(FP),并且TS显著降低了模型的ECE。分箱筛选方法与TS相结合可以进一步减少FP并提高精度。我们的研究结果表明,TS有效地降低了蛋白质-RNA结合位点预测中的不确定性,并且最小化模型不确定性可提高预测质量。数据和代码可在https://github.com/trustcm/UQ-TS-Split-bins-RBP获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验