通过基于位置特异性得分矩阵（PSSM）信息的压缩技术改进DNA结合蛋白的检测。

Improved detection of DNA-binding proteins via compression technology on PSSM information.

作者信息

Wang Yubo, Ding Yijie, Guo Fei, Wei Leyi, Tang Jijun

机构信息

School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.

Tianjin University Institute of Computational Biology, Tianjin 300350, China.

出版信息

PLoS One. 2017 Sep 29;12(9):e0185587. doi: 10.1371/journal.pone.0185587. eCollection 2017.

DOI:10.1371/journal.pone.0185587

PMID:28961273

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5621689/

Abstract

Since the importance of DNA-binding proteins in multiple biomolecular functions has been recognized, an increasing number of researchers are attempting to identify DNA-binding proteins. In recent years, the machine learning methods have become more and more compelling in the case of protein sequence data soaring, because of their favorable speed and accuracy. In this paper, we extract three features from the protein sequence, namely NMBAC (Normalized Moreau-Broto Autocorrelation), PSSM-DWT (Position-specific scoring matrix-Discrete Wavelet Transform), and PSSM-DCT (Position-specific scoring matrix-Discrete Cosine Transform). We also employ feature selection algorithm on these feature vectors. Then, these features are fed into the training SVM (support vector machine) model as classifier to predict DNA-binding proteins. Our method applys three datasets, namely PDB1075, PDB594 and PDB186, to evaluate the performance of our approach. The PDB1075 and PDB594 datasets are employed for Jackknife test and the PDB186 dataset is used for the independent test. Our method achieves the best accuracy in the Jacknife test, from 79.20% to 86.23% and 80.5% to 86.20% on PDB1075 and PDB594 datasets, respectively. In the independent test, the accuracy of our method comes to 76.3%. The performance of independent test also shows that our method has a certain ability to be effectively used for DNA-binding protein prediction. The data and source code are at https://doi.org/10.6084/m9.figshare.5104084.

摘要

由于DNA结合蛋白在多种生物分子功能中的重要性已得到认可，越来越多的研究人员试图鉴定DNA结合蛋白。近年来，随着蛋白质序列数据的激增，机器学习方法因其良好的速度和准确性而变得越来越有吸引力。在本文中，我们从蛋白质序列中提取了三个特征，即归一化莫罗-布罗托自相关（NMBAC）、位置特异性评分矩阵-离散小波变换（PSSM-DWT）和位置特异性评分矩阵-离散余弦变换（PSSM-DCT）。我们还对这些特征向量采用了特征选择算法。然后，将这些特征输入到作为分类器的训练支持向量机（SVM）模型中，以预测DNA结合蛋白。我们的方法应用了三个数据集，即PDB1075、PDB594和PDB186，来评估我们方法的性能。PDB1075和PDB594数据集用于留一法检验，PDB186数据集用于独立检验。我们的方法在留一法检验中取得了最佳准确率，在PDB1075和PDB594数据集上分别为79.20%至86.23%和80.5%至86.20%。在独立检验中，我们方法的准确率达到了76.3%。独立检验的性能也表明我们的方法具有一定的有效用于DNA结合蛋白预测的能力。数据和源代码位于https://doi.org/10.6084/m9.figshare.5104084 。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过基于位置特异性得分矩阵（PSSM）信息的压缩技术改进DNA结合蛋白的检测。

Improved detection of DNA-binding proteins via compression technology on PSSM information.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

通过基于位置特异性得分矩阵（PSSM）信息的压缩技术改进DNA结合蛋白的检测。

Improved detection of DNA-binding proteins via compression technology on PSSM information.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献