Suppr超能文献

基于离散小波变换和小波包变换的蛋白质-DNA 结合界面热点预测。

Prediction of hot spots in protein-DNA binding interfaces based on discrete wavelet transform and wavelet packet transform.

机构信息

School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China.

Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.

出版信息

BMC Bioinformatics. 2023 Apr 4;24(1):129. doi: 10.1186/s12859-023-05263-7.

Abstract

BACKGROUND

Identification of hot spots in protein-DNA binding interfaces is extremely important for understanding the underlying mechanisms of protein-DNA interactions and drug design. Since experimental methods for identifying hot spots are time-consuming and expensive, and most of the existing computational methods are based on traditional protein-DNA features to predict hot spots, unable to make full use of the effective information in the features.

RESULTS

In this work, a method named WTL-PDH is proposed for hot spots prediction. To deal with the unbalanced dataset, we used the Synthetic Minority Over-sampling Technique to generate minority class samples to achieve the balance of dataset. First, we extracted the solvent accessible surface area features and structural features, and then processed the traditional features using discrete wavelet transform and wavelet packet transform to extract the wavelet energy information and wavelet entropy information, and obtained a total of 175 dimensional features. In order to obtain the best feature subset, we systematically evaluate these features in various feature selection strategies. Finally, light gradient boosting machine (LightGBM) was used to establish the model.

CONCLUSIONS

Our method achieved good results on independent test set with AUC, MCC and F1 scores of 0.838, 0.533 and 0.750, respectively. WTL-PDH can achieve generally better performance in predicting hot spots when compared with state-of-the-art methods. The dataset and source code are available at https://github.com/chase2555/WTL-PDH .

摘要

背景

识别蛋白质-DNA 结合界面的热点对于理解蛋白质-DNA 相互作用的基本机制和药物设计非常重要。由于识别热点的实验方法既耗时又昂贵,并且大多数现有的计算方法都是基于传统的蛋白质-DNA 特征来预测热点,无法充分利用特征中的有效信息。

结果

在这项工作中,提出了一种名为 WTL-PDH 的方法来进行热点预测。为了解决不平衡数据集的问题,我们使用了合成少数过采样技术来生成少数类样本,以实现数据集的平衡。首先,我们提取了溶剂可及表面积特征和结构特征,然后使用离散小波变换和小波包变换对传统特征进行处理,以提取小波能量信息和小波熵信息,总共获得了 175 维特征。为了获得最佳的特征子集,我们在各种特征选择策略中系统地评估了这些特征。最后,使用轻梯度提升机(LightGBM)建立模型。

结论

我们的方法在独立测试集上取得了良好的效果,AUC、MCC 和 F1 得分分别为 0.838、0.533 和 0.750。与最先进的方法相比,WTL-PDH 在预测热点方面通常可以取得更好的性能。数据集和源代码可在 https://github.com/chase2555/WTL-PDH 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/13e3/10074722/b95b4709186b/12859_2023_5263_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验