School of Software, Central South University, Changsha, 410075, China.
Lab of Information Management, Changzhou University, Changzhou, 213164, China.
BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):522. doi: 10.1186/s12859-018-2527-1.
Identifying specific residues for protein-DNA interactions are of considerable importance to better recognize the binding mechanism of protein-DNA complexes. Despite the fact that many computational DNA-binding residue prediction approaches have been developed, there is still significant room for improvement concerning overall performance and availability.
Here, we present an efficient approach termed PDRLGB that uses a light gradient boosting machine (LightGBM) to predict binding residues in protein-DNA complexes. Initially, we extract a wide variety of 913 sequence and structure features with a sliding window of 11. Then, we apply the random forest algorithm to sort the features in descending order of importance and obtain the optimal subset of features using incremental feature selection. Based on the selected feature set, we use a light gradient boosting machine to build the prediction model for DNA-binding residues. Our PDRLGB method shows better overall predictive accuracy and relatively less training time than other widely used machine learning (ML) methods such as random forest (RF), Adaboost and support vector machine (SVM). We further compare PDRLGB with various existing approaches on the independent test datasets and show improvement in results over the existing state-of-the-art approaches.
PDRLGB is an efficient approach to predict specific residues for protein-DNA interactions.
确定蛋白质- DNA 相互作用的特定残基对于更好地识别蛋白质-DNA 复合物的结合机制非常重要。尽管已经开发了许多计算 DNA 结合残基预测方法,但在整体性能和可用性方面仍有很大的改进空间。
在这里,我们提出了一种称为 PDRLGB 的有效方法,该方法使用轻梯度提升机(LightGBM)来预测蛋白质-DNA 复合物中的结合残基。首先,我们使用滑动窗口为 11 提取了各种各样的 913 种序列和结构特征。然后,我们应用随机森林算法按重要性降序对特征进行排序,并使用增量特征选择获得最佳特征子集。基于所选特征集,我们使用轻梯度提升机构建用于 DNA 结合残基的预测模型。与随机森林(RF)、Adaboost 和支持向量机(SVM)等其他广泛使用的机器学习(ML)方法相比,我们的 PDRLGB 方法显示出更好的整体预测准确性和相对较少的训练时间。我们还在独立测试数据集上比较了 PDRLGB 与各种现有方法,并在结果上显示出优于现有最先进方法的改进。
PDRLGB 是一种预测蛋白质-DNA 相互作用特定残基的有效方法。