Suppr超能文献

蛋白质深度学习模型的可解释性

Explainability of Protein Deep Learning Models.

作者信息

Fazel Zahra, de Souza Camila P E, Golding G Brian, Ilie Lucian

机构信息

Department of Computer Science, University of Western Ontario, London, ON N6A 5B7, Canada.

Department of Statistical and Actuarial Sciences, University of Western Ontario, London, ON N6A 5B7, Canada.

出版信息

Int J Mol Sci. 2025 May 29;26(11):5255. doi: 10.3390/ijms26115255.

Abstract

Protein embeddings are the new main source of information about proteins, producing state-of-the-art solutions to many problems, including protein interaction prediction, a fundamental issue in proteomics. Understanding the embeddings and what causes the interactions is very important, as these models lack transparency due to their black-box nature. In the first study of its kind, we investigate the inner workings of these models using XAI (explainable AI) approaches. We perform extensive testing (3.3 TB of total data) involving nine of the best-known XAI methods on two problems: (i) the prediction of protein interaction sites using the current top method, Seq-InSite, and (ii) the production of protein embedding vectors using three methods, ProtBERT, ProtT5, and Ankh. The results are evaluated in terms of their ability to correlate with six basic amino acid properties-aromaticity, acidity/basicity, hydrophobicity, molecular mass, van der Waals volume, and dipole moment-as well as the propensity for interaction with other proteins, the impact of distant residues, and the infidelity scores of the XAI methods. The results are unexpected. Some XAI methods are much better than others at discovering essential information. Simple methods can be as good as advanced ones. Different protein embedding vectors can capture distinct properties, indicating significant room for improvement in embedding quality.

摘要

蛋白质嵌入是有关蛋白质的新的主要信息来源,为包括蛋白质相互作用预测(蛋白质组学中的一个基本问题)在内的许多问题提供了最先进的解决方案。理解这些嵌入以及导致相互作用的原因非常重要,因为这些模型由于其黑箱性质而缺乏透明度。在同类研究中的首次研究中,我们使用可解释人工智能(XAI)方法来研究这些模型的内部运作。我们针对两个问题进行了广泛测试(总共3.3 TB数据),涉及九种最著名的XAI方法:(i)使用当前顶级方法Seq-InSite预测蛋白质相互作用位点,以及(ii)使用ProtBERT、ProtT5和Ankh三种方法生成蛋白质嵌入向量。根据它们与六种基本氨基酸特性(芳香性、酸度/碱度、疏水性、分子量、范德华体积和偶极矩)的相关性,以及与其他蛋白质相互作用的倾向、远距离残基的影响和XAI方法的不忠实分数来评估结果。结果出人意料。在发现关键信息方面,一些XAI方法比其他方法要好得多。简单方法可以与先进方法一样好。不同的蛋白质嵌入向量可以捕捉不同的特性,这表明在嵌入质量方面有很大的改进空间。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验