Suppr超能文献

使用集成方法计算蛋白质-DNA 结合界面中的热点。

Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach.

机构信息

Department of Computer Science and Technology, Tongji University, No. 4800 Caoan Road, Shanghai, 201804, China.

Shanghai Key Laboratory of Intelligent Information Processing, and School of Computer Science, Fudan University, No. 220 Handan Road, Shanghai, 200433, China.

出版信息

BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):384. doi: 10.1186/s12859-020-03675-3.

Abstract

BACKGROUND

Protein-DNA interaction governs a large number of cellular processes, and it can be altered by a small fraction of interface residues, i.e., the so-called hot spots, which account for most of the interface binding free energy. Accurate prediction of hot spots is critical to understand the principle of protein-DNA interactions. There are already some computational methods that can accurately and efficiently predict a large number of hot residues. However, the insufficiency of experimentally validated hot-spot residues in protein-DNA complexes and the low diversity of the employed features limit the performance of existing methods.

RESULTS

Here, we report a new computational method for effectively predicting hot spots in protein-DNA binding interfaces. This method, called PreHots (the abbreviation of Predicting Hotspots), adopts an ensemble stacking classifier that integrates different machine learning classifiers to generate a robust model with 19 features selected by a sequential backward feature selection algorithm. To this end, we constructed two new and reliable datasets (one benchmark for model training and one independent dataset for validation), which totally consist of 123 hot spots and 137 non-hot spots from 89 protein-DNA complexes. The data were manually collected from the literature and existing databases with a strict process of redundancy removal. Our method achieves a sensitivity of 0.813 and an AUC score of 0.868 in 10-fold cross-validation on the benchmark dataset, and a sensitivity of 0.818 and an AUC score of 0.820 on the independent test dataset. The results show that our approach outperforms the existing ones.

CONCLUSIONS

PreHots, which is based on stack ensemble of boosting algorithms, can reliably predict hot spots at the protein-DNA binding interface on a large scale. Compared with the existing methods, PreHots can achieve better prediction performance. Both the webserver of PreHots and the datasets are freely available at: http://dmb.tongji.edu.cn/tools/PreHots/ .

摘要

背景

蛋白质与 DNA 的相互作用控制着许多细胞过程,而这些过程可以通过界面残基的一小部分发生改变,即所谓的热点,这些热点占大部分界面结合自由能。准确预测热点对于理解蛋白质与 DNA 的相互作用原理至关重要。目前已经有一些计算方法可以准确有效地预测大量的热点残基。然而,蛋白质-DNA 复合物中实验验证的热点残基不足,以及所采用特征的多样性有限,限制了现有方法的性能。

结果

我们在这里报告了一种新的计算方法,用于有效预测蛋白质-DNA 结合界面中的热点。这种方法称为 PreHots(预测热点的缩写),采用集成堆叠分类器,集成了不同的机器学习分类器,通过顺序后向特征选择算法生成一个具有 19 个特征的稳健模型。为此,我们构建了两个新的、可靠的数据集(一个用于模型训练的基准数据集,一个独立的验证数据集),总共包含 89 个蛋白质-DNA 复合物中的 123 个热点和 137 个非热点。这些数据是从文献和现有数据库中手动收集的,并经过严格的去重过程。在基准数据集上的 10 倍交叉验证中,我们的方法在敏感性方面达到了 0.813,AUC 评分为 0.868,在独立测试数据集上的敏感性达到了 0.818,AUC 评分为 0.820。结果表明,我们的方法优于现有的方法。

结论

PreHots 基于提升算法的堆叠集成,可以可靠地预测蛋白质-DNA 结合界面上的热点,在大规模上具有良好的预测性能。与现有的方法相比,PreHots 可以实现更好的预测性能。PreHots 的网络服务器和数据集均可在以下网址免费获取:http://dmb.tongji.edu.cn/tools/PreHots/。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddd2/7495898/65c86fe2dde4/12859_2020_3675_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验