基于多特征融合和二进制萤火虫优化算法的DNA结合蛋白识别

Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm.

作者信息

Zhang Jian, Gao Bo, Chai Haiting, Ma Zhiqiang, Yang Guifu

机构信息

School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130117, People's Republic of China.

Office of Informatization Management and Planning, Northeast Normal University, Changchun, 130117, People's Republic of China.

出版信息

BMC Bioinformatics. 2016 Aug 26;17(1):323. doi: 10.1186/s12859-016-1201-8.

DOI:10.1186/s12859-016-1201-8

PMID:27565741

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5002159/

Abstract

BACKGROUND

DNA-binding proteins (DBPs) play fundamental roles in many biological processes. Therefore, the developing of effective computational tools for identifying DBPs is becoming highly desirable.

RESULTS

In this study, we proposed an accurate method for the prediction of DBPs. Firstly, we focused on the challenge of improving DBP prediction accuracy with information solely from the sequence. Secondly, we used multiple informative features to encode the protein. These features included evolutionary conservation profile, secondary structure motifs, and physicochemical properties. Thirdly, we introduced a novel improved Binary Firefly Algorithm (BFA) to remove redundant or noisy features as well as select optimal parameters for the classifier. The experimental results of our predictor on two benchmark datasets outperformed many state-of-the-art predictors, which revealed the effectiveness of our method. The promising prediction performance on a new-compiled independent testing dataset from PDB and a large-scale dataset from UniProt proved the good generalization ability of our method. In addition, the BFA forged in this research would be of great potential in practical applications in optimization fields, especially in feature selection problems.

CONCLUSIONS

A highly accurate method was proposed for the identification of DBPs. A user-friendly web-server named iDbP (identification of DNA-binding Proteins) was constructed and provided for academic use.

摘要

背景

DNA结合蛋白（DBP）在许多生物学过程中发挥着重要作用。因此，开发有效的计算工具来识别DBP变得非常必要。

结果

在本研究中，我们提出了一种预测DBP的准确方法。首先，我们关注仅利用序列信息提高DBP预测准确性的挑战。其次，我们使用多种信息特征对蛋白质进行编码。这些特征包括进化保守谱、二级结构基序和理化性质。第三，我们引入了一种新颖的改进型二进制萤火虫算法（BFA）来去除冗余或噪声特征，并为分类器选择最优参数。我们的预测器在两个基准数据集上的实验结果优于许多现有先进预测器，这表明了我们方法的有效性。在一个新编译的来自PDB的独立测试数据集和一个来自UniProt的大规模数据集上的良好预测性能证明了我们方法具有良好的泛化能力。此外，本研究中构建的BFA在优化领域的实际应用中，特别是在特征选择问题上具有巨大潜力。

结论

我们提出了一种用于识别DBP的高精度方法。构建了一个名为iDbP（DNA结合蛋白识别）的用户友好型网络服务器，供学术使用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于多特征融合和二进制萤火虫优化算法的DNA结合蛋白识别

Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

基于多特征融合和二进制萤火虫优化算法的DNA结合蛋白识别

Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献