Suppr超能文献

目标-DBPPred:一种使用基于离散小波变换的压缩和轻极限梯度提升的智能 DNA 结合蛋白预测模型。

Target-DBPPred: An intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting.

机构信息

Department of Elementary and Secondary Education, Peshawar, Khyber Pakhtunkhwa, Pakistan; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.

Department of Computer Science, College of Computer Science, King Khalid University, Abha, Saudi Arabia.

出版信息

Comput Biol Med. 2022 Jun;145:105533. doi: 10.1016/j.compbiomed.2022.105533. Epub 2022 Apr 16.

Abstract

DNA-protein interaction is a critical biological process that performs influential activities, including DNA transcription and recombination. DBPs (DNA-binding proteins) are closely associated with different kinds of human diseases (asthma, cancer, and AIDS), while some of the DBPs are used in the production of antibiotics, steroids, and anti-inflammatories. Several methods have been reported for the prediction of DBPs. However, a more intelligent method is still highly desirable for the accurate prediction of DBPs. This study presents an intelligent computational method, Target-DBPPred, to improve DBPs prediction. Important features from primary protein sequences are investigated via a novel feature descriptor, called EDF-PSSM-DWT (Evolutionary difference formula position-specific scoring matrix-discrete wavelet transform) and several other multi-evolutionary methods, including F-PSSM (Filtered position-specific scoring matrix), EDF-PSSM (Evolutionary difference formula position-specific scoring matrix), PSSM-DPC (Position-specific scoring matrix-dipeptide composition), and Lead-BiPSSM (Lead-bigram-position specific scoring matrix) to encapsulate diverse multivariate features. The best feature set from the features of each descriptor is selected using sequential forward selection (SFS). Further, four models are trained using Adaboost, XGB (eXtreme gradient boosting), ERT (extremely randomized trees), and LiXGB (Light eXtreme gradient boosting) classifiers. LiXGB, with the best feature set of EDF-PSSM-DWT, has attained 6.69% and 15.07% higher performance in terms of accuracies using training and testing datasets, respectively. The obtained results verify the improved performance of our proposed predictor over the existing predictors.

摘要

DNA-蛋白质相互作用是一种关键的生物过程,执行着包括 DNA 转录和重组在内的重要活动。DBP(DNA 结合蛋白)与各种人类疾病(哮喘、癌症和艾滋病)密切相关,而一些 DBP 则用于生产抗生素、类固醇和消炎药。已经报道了几种预测 DBP 的方法。然而,对于 DBP 的准确预测,仍然非常需要一种更智能的方法。本研究提出了一种智能计算方法 Target-DBPPred,以改进 DBP 的预测。通过一种新的特征描述符 EDF-PSSM-DWT(进化差异公式位置特异性评分矩阵-离散小波变换)和其他几种多进化方法,包括 F-PSSM(过滤位置特异性评分矩阵)、EDF-PSSM(进化差异公式位置特异性评分矩阵)、PSSM-DPC(位置特异性评分矩阵-二肽组成)和 Lead-BiPSSM(先导双位位置特异性评分矩阵),研究了来自原始蛋白质序列的重要特征,以封装多种多元特征。使用顺序前向选择(SFS)从每个描述符的特征中选择最佳特征集。此外,使用 Adaboost、XGB(极端梯度提升)、ERT(极端随机树)和 LiXGB(轻极端梯度提升)分类器训练了四个模型。LiXGB 使用 EDF-PSSM-DWT 的最佳特征集,在使用训练数据集和测试数据集的准确率方面分别提高了 6.69%和 15.07%。所得结果验证了我们提出的预测器相对于现有预测器的改进性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验