Suppr超能文献

DeepLoc 2.0:使用蛋白质语言模型进行多标签亚细胞定位预测。

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models.

机构信息

Indian Institute of Technology Madras, Chennai 600036, India.

Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen 2200, Denmark.

出版信息

Nucleic Acids Res. 2022 Jul 5;50(W1):W228-W234. doi: 10.1093/nar/gkac278.

Abstract

The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.

摘要

蛋白质亚细胞定位预测对于蛋白质组学研究具有重要意义。在这里,我们对流行的 DeepLoc 工具进行了更新,增加了多定位预测功能,并在性能和可解释性方面进行了改进。在训练和验证过程中,我们使用严格的同源分区方法整理了真核生物和人类多定位蛋白质数据集,并从文献中收集了丰富的分选信号信息。我们使用预训练的蛋白质语言模型在 DeepLoc 2.0 中实现了最先进的性能。它还有一个进一步的优势,即它使用序列输入,而不是依赖于较慢的蛋白质图谱。我们提供了两种更好的可解释性方法:沿序列的注意力输出和对九种不同类型蛋白质分选信号的高度准确预测。我们发现,注意力输出与分选信号的位置密切相关。该网络服务器可在 services.healthtech.dtu.dk/service.php?DeepLoc-2.0 访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/35f2/9252801/793890fdbe23/gkac278figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验