Suppr超能文献

PDRLGB:使用轻量级梯度提升机进行精确的 DNA 结合残基预测。

PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine.

机构信息

School of Software, Central South University, Changsha, 410075, China.

Lab of Information Management, Changzhou University, Changzhou, 213164, China.

出版信息

BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):522. doi: 10.1186/s12859-018-2527-1.

Abstract

BACKGROUND

Identifying specific residues for protein-DNA interactions are of considerable importance to better recognize the binding mechanism of protein-DNA complexes. Despite the fact that many computational DNA-binding residue prediction approaches have been developed, there is still significant room for improvement concerning overall performance and availability.

RESULTS

Here, we present an efficient approach termed PDRLGB that uses a light gradient boosting machine (LightGBM) to predict binding residues in protein-DNA complexes. Initially, we extract a wide variety of 913 sequence and structure features with a sliding window of 11. Then, we apply the random forest algorithm to sort the features in descending order of importance and obtain the optimal subset of features using incremental feature selection. Based on the selected feature set, we use a light gradient boosting machine to build the prediction model for DNA-binding residues. Our PDRLGB method shows better overall predictive accuracy and relatively less training time than other widely used machine learning (ML) methods such as random forest (RF), Adaboost and support vector machine (SVM). We further compare PDRLGB with various existing approaches on the independent test datasets and show improvement in results over the existing state-of-the-art approaches.

CONCLUSIONS

PDRLGB is an efficient approach to predict specific residues for protein-DNA interactions.

摘要

背景

确定蛋白质- DNA 相互作用的特定残基对于更好地识别蛋白质-DNA 复合物的结合机制非常重要。尽管已经开发了许多计算 DNA 结合残基预测方法,但在整体性能和可用性方面仍有很大的改进空间。

结果

在这里,我们提出了一种称为 PDRLGB 的有效方法,该方法使用轻梯度提升机(LightGBM)来预测蛋白质-DNA 复合物中的结合残基。首先,我们使用滑动窗口为 11 提取了各种各样的 913 种序列和结构特征。然后,我们应用随机森林算法按重要性降序对特征进行排序,并使用增量特征选择获得最佳特征子集。基于所选特征集,我们使用轻梯度提升机构建用于 DNA 结合残基的预测模型。与随机森林(RF)、Adaboost 和支持向量机(SVM)等其他广泛使用的机器学习(ML)方法相比,我们的 PDRLGB 方法显示出更好的整体预测准确性和相对较少的训练时间。我们还在独立测试数据集上比较了 PDRLGB 与各种现有方法,并在结果上显示出优于现有最先进方法的改进。

结论

PDRLGB 是一种预测蛋白质-DNA 相互作用特定残基的有效方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/00cb/6311926/8fcc8793f2cc/12859_2018_2527_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验