Suppr超能文献

利用进化信息和基于序列的深度学习预测蛋白质的亚细胞定位。

Predicting subcellular location of protein with evolution information and sequence-based deep learning.

机构信息

Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, 1 Xuefu North Road, University Town, Fuzhou, 350122, FJ, China.

Department of Computer Science and Engineering, University of South Carolina, 550 Assembly St, Columbia, SC, 29208, USA.

出版信息

BMC Bioinformatics. 2021 Oct 22;22(Suppl 10):515. doi: 10.1186/s12859-021-04404-0.

Abstract

BACKGROUND

Protein subcellular localization prediction plays an important role in biology research. Since traditional methods are laborious and time-consuming, many machine learning-based prediction methods have been proposed. However, most of the proposed methods ignore the evolution information of proteins. In order to improve the prediction accuracy, we present a deep learning-based method to predict protein subcellular locations.

RESULTS

Our method utilizes not only amino acid compositions sequence but also evolution matrices of proteins. Our method uses a bidirectional long short-term memory network that processes the entire protein sequence and a convolutional neural network that extracts features from protein sequences. The position specific scoring matrix is used as a supplement to protein sequences. Our method was trained and tested on two benchmark datasets. The experiment results show that our method yields accurate results on the two datasets with an average precision of 0.7901, ranking loss of 0.0758 and coverage of 1.2848.

CONCLUSION

The experiment results show that our method outperforms five methods currently available. According to those experiments, we can see that our method is an acceptable alternative to predict protein subcellular location.

摘要

背景

蛋白质亚细胞定位预测在生物学研究中起着重要作用。由于传统方法既费力又耗时,因此已经提出了许多基于机器学习的预测方法。但是,大多数提出的方法都忽略了蛋白质的进化信息。为了提高预测精度,我们提出了一种基于深度学习的方法来预测蛋白质亚细胞位置。

结果

我们的方法不仅利用了氨基酸组成序列,还利用了蛋白质的进化矩阵。我们的方法使用双向长短时记忆网络来处理整个蛋白质序列,以及卷积神经网络来从蛋白质序列中提取特征。位置特异性评分矩阵被用作蛋白质序列的补充。我们的方法在两个基准数据集上进行了训练和测试。实验结果表明,我们的方法在两个数据集上均取得了准确的结果,平均精度为 0.7901,排名损失为 0.0758,覆盖率为 1.2848。

结论

实验结果表明,我们的方法优于目前可用的五种方法。根据这些实验,我们可以看到,我们的方法是预测蛋白质亚细胞位置的一种可接受的替代方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c699/8539821/ee6ad2713211/12859_2021_4404_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验