Zhang Jian, Gao Bo, Chai Haiting, Ma Zhiqiang, Yang Guifu
School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130117, People's Republic of China.
Office of Informatization Management and Planning, Northeast Normal University, Changchun, 130117, People's Republic of China.
BMC Bioinformatics. 2016 Aug 26;17(1):323. doi: 10.1186/s12859-016-1201-8.
DNA-binding proteins (DBPs) play fundamental roles in many biological processes. Therefore, the developing of effective computational tools for identifying DBPs is becoming highly desirable.
In this study, we proposed an accurate method for the prediction of DBPs. Firstly, we focused on the challenge of improving DBP prediction accuracy with information solely from the sequence. Secondly, we used multiple informative features to encode the protein. These features included evolutionary conservation profile, secondary structure motifs, and physicochemical properties. Thirdly, we introduced a novel improved Binary Firefly Algorithm (BFA) to remove redundant or noisy features as well as select optimal parameters for the classifier. The experimental results of our predictor on two benchmark datasets outperformed many state-of-the-art predictors, which revealed the effectiveness of our method. The promising prediction performance on a new-compiled independent testing dataset from PDB and a large-scale dataset from UniProt proved the good generalization ability of our method. In addition, the BFA forged in this research would be of great potential in practical applications in optimization fields, especially in feature selection problems.
A highly accurate method was proposed for the identification of DBPs. A user-friendly web-server named iDbP (identification of DNA-binding Proteins) was constructed and provided for academic use.
DNA结合蛋白(DBP)在许多生物学过程中发挥着重要作用。因此,开发有效的计算工具来识别DBP变得非常必要。
在本研究中,我们提出了一种预测DBP的准确方法。首先,我们关注仅利用序列信息提高DBP预测准确性的挑战。其次,我们使用多种信息特征对蛋白质进行编码。这些特征包括进化保守谱、二级结构基序和理化性质。第三,我们引入了一种新颖的改进型二进制萤火虫算法(BFA)来去除冗余或噪声特征,并为分类器选择最优参数。我们的预测器在两个基准数据集上的实验结果优于许多现有先进预测器,这表明了我们方法的有效性。在一个新编译的来自PDB的独立测试数据集和一个来自UniProt的大规模数据集上的良好预测性能证明了我们方法具有良好的泛化能力。此外,本研究中构建的BFA在优化领域的实际应用中,特别是在特征选择问题上具有巨大潜力。
我们提出了一种用于识别DBP的高精度方法。构建了一个名为iDbP(DNA结合蛋白识别)的用户友好型网络服务器,供学术使用。