Suppr超能文献

MsDBP:通过整合多尺度序列信息和周的五步法则探索 DNA 结合蛋白

MsDBP: Exploring DNA-Binding Proteins by Integrating Multiscale Sequence Information via Chou's Five-Step Rule.

机构信息

The School of Computer Science and Technology , Anhui University , Hefei , Anhui , China.

Department of Gastroenterology , The First Affiliated Hospital of Anhui Medical University , Hefei , Anhui , China.

出版信息

J Proteome Res. 2019 Aug 2;18(8):3119-3132. doi: 10.1021/acs.jproteome.9b00226. Epub 2019 Jul 17.

Abstract

DNA-binding proteins are crucial to alternative splicing, methylation, and the structural composition of the DNA. The existing experimental methods for identifying DNA-binding proteins are expensive and time-consuming; thus, it is necessary to develop a fast and accurate computational method to address the problem. In this Article, we report a novel predictor MsDBP, a DNA-binding protein prediction method that combines the multiscale sequence feature into a deep neural network. First of all, instead of developing a narrow-application structured-based method, we are committed to a sequenced-based predictor. Second, instead of characterizing the whole protein directly, we divide the protein into subsequences with different lengths and then encode them into a vector based on composition information. In this way, the multiscale sequence feature can be obtained. Finally, a branch of dense layers is applied for learning multilevel abstract features to discriminate DNA-binding proteins. When MsDBP is tested on the independent data set PDB2272, it achieves an overall accuracy of 66.99% with the SE of 70.69%. In addition, we also perform extensive experiments to compare the proposed method with other existing methods. The results indicate that MsDBP would be a useful tool for the identification of DNA-binding proteins. MsDBP is freely available at a web server on http://47.100.203.218/MsDBP/ .

摘要

DNA 结合蛋白对于可变剪接、甲基化和 DNA 的结构组成至关重要。目前用于鉴定 DNA 结合蛋白的实验方法既昂贵又耗时;因此,有必要开发一种快速而准确的计算方法来解决这个问题。在本文中,我们报告了一种新的预测器 MsDBP,这是一种将多尺度序列特征结合到深度神经网络中的 DNA 结合蛋白预测方法。首先,我们致力于开发基于序列的预测器,而不是开发应用范围狭窄的基于结构的方法。其次,我们不是直接对整个蛋白质进行特征描述,而是将蛋白质分成不同长度的子序列,然后根据组成信息将它们编码成一个向量。这样,就可以获得多尺度序列特征。最后,应用密集层的分支来学习多层次的抽象特征,以区分 DNA 结合蛋白。当 MsDBP 在独立数据集 PDB2272 上进行测试时,其总体准确率为 66.99%,SE 为 70.69%。此外,我们还进行了广泛的实验,将提出的方法与其他现有方法进行了比较。结果表明,MsDBP 将成为鉴定 DNA 结合蛋白的有用工具。MsDBP 可在 http://47.100.203.218/MsDBP/ 上的网络服务器上免费获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验