Suppr超能文献

DeepLoc:使用深度学习进行蛋白质亚细胞定位预测。

DeepLoc: prediction of protein subcellular localization using deep learning.

机构信息

Department of Bio and Health Informatics, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.

The Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen N, Denmark.

出版信息

Bioinformatics. 2017 Nov 1;33(21):3387-3395. doi: 10.1093/bioinformatics/btx431.

Abstract

MOTIVATION

The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only.

RESULTS

Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information.

AVAILABILITY AND IMPLEMENTATION

The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php.

CONTACT

jjalma@dtu.dk.

摘要

动机

由于在蛋白质组学研究中具有相关性,预测真核蛋白质亚细胞定位是生物信息学中一个研究得很好的课题。许多机器学习方法已成功应用于该任务,但在大多数方法中,预测依赖于从知识数据库中注释同源物。对于没有注释同源物的新蛋白质,并且对于预测序列变体的影响,仅使用序列信息预测蛋白质特性的方法是可取的。

结果

在这里,我们提出了一种使用深度神经网络的预测算法,该算法仅依赖于序列信息来预测蛋白质亚细胞定位。在其核心,预测模型使用循环神经网络来处理整个蛋白质序列和注意力机制,该机制识别对亚细胞定位重要的蛋白质区域。该模型在从最新的 UniProt 版本之一提取的蛋白质数据集上进行了训练和测试,其中实验注释的蛋白质遵循比以前更严格的标准。我们证明,我们的模型实现了较高的准确性(10 个类别中的 78%;膜结合或可溶性的 92%),优于当前最先进的算法,包括依赖同源信息的算法。

可用性和实现

该方法可作为网络服务器在 http://www.cbs.dtu.dk/services/DeepLoc 使用。示例代码可在 https://github.com/JJAlmagro/subcellular_localization 获得。数据集可在 http://www.cbs.dtu.dk/services/DeepLoc/data.php 获得。

联系

jjalma@dtu.dk

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验