基于全进化谱的深度卷积神经网络从序列预测 DNA 结合蛋白。

Enabling full-length evolutionary profiles based deep convolutional neural network for predicting DNA-binding proteins from sequence.

机构信息

School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.

出版信息

Proteins. 2020 Jan;88(1):15-30. doi: 10.1002/prot.25763. Epub 2019 Jul 8.

Abstract

Sequence based DNA-binding protein (DBP) prediction is a widely studied biological problem. Sliding windows on position specific substitution matrices (PSSMs) rows predict DNA-binding residues well on known DBPs but the same models cannot be applied to unequally sized protein sequences. PSSM summaries representing column averages and their amino-acid wise versions have been effectively used for the task, but it remains unclear if these features carry all the PSSM's predictive power, traditionally harnessed for binding site predictions. Here we evaluate if PSSMs scaled up to a fixed size by zero-vector padding (pPSSM) could perform better than the summary based features on similar models. Using multilayer perceptron (MLP) and deep convolutional neural network (CNN), we found that (a) Summary features work well for single-genome (human-only) data but are outperformed by pPSSM for diverse PDB-derived data sets, suggesting greater summary-level redundancy in the former, (b) even when summary features work comparably well with pPSSM, a consensus on the two outperforms both of them (c) CNN models comprehensively outperform their corresponding MLP models and (d) actual predicted scores from different models depend on the choice of input feature sets used whereas overall performance levels are model-dependent in which CNN leads the accuracy.

摘要

基于序列的 DNA 结合蛋白 (DBP) 预测是一个广泛研究的生物学问题。基于位置特异性替换矩阵 (PSSM) 行的滑动窗口很好地预测了已知 DBP 中的 DNA 结合残基，但相同的模型不能应用于大小不均的蛋白质序列。代表列平均值及其氨基酸版本的 PSSM 摘要已被有效地用于该任务，但仍不清楚这些特征是否具有传统上用于结合位点预测的 PSSM 的所有预测能力。在这里，我们评估了通过零向量填充 (pPSSM) 扩展到固定大小的 PSSM 是否可以在类似的模型上优于基于摘要的特征。使用多层感知机 (MLP) 和深度卷积神经网络 (CNN)，我们发现：(a) 摘要特征在单基因组（仅人类）数据上表现良好，但在来自多样化 PDB 的数据集上表现不如 pPSSM，这表明前者在摘要级别上存在更大的冗余；(b) 即使摘要特征与 pPSSM 表现相当，两者的共识也优于两者；(c) CNN 模型全面优于其对应的 MLP 模型；(d) 来自不同模型的实际预测得分取决于输入特征集的选择，而整体性能水平则取决于模型，其中 CNN 领先于准确性。

相似文献

Enabling full-length evolutionary profiles based deep convolutional neural network for predicting DNA-binding proteins from sequence.基于全进化谱的深度卷积神经网络从序列预测 DNA 结合蛋白。

Proteins. 2020 Jan;88(1):15-30. doi: 10.1002/prot.25763. Epub 2019 Jul 8.

PSSM-based prediction of DNA binding sites in proteins.基于位置特异性得分矩阵的蛋白质中DNA结合位点预测

BMC Bioinformatics. 2005 Feb 19;6:33. doi: 10.1186/1471-2105-6-33.

DeepDNAbP: A deep learning-based hybrid approach to improve the identification of deoxyribonucleic acid-binding proteins.DeepDNAbP：一种基于深度学习的混合方法，用于提高脱氧核糖核酸结合蛋白的识别能力。

Comput Biol Med. 2022 Jun;145:105433. doi: 10.1016/j.compbiomed.2022.105433. Epub 2022 Mar 30.

Weakly-Supervised Convolutional Neural Network Architecture for Predicting Protein-DNA Binding.弱监督卷积神经网络结构用于预测蛋白质-DNA 结合。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):679-689. doi: 10.1109/TCBB.2018.2864203. Epub 2018 Aug 7.

An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.基于氨基酸序列中上下文特征的 DNA 结合蛋白预测的改进深度学习方法。

PLoS One. 2019 Nov 14;14(11):e0225317. doi: 10.1371/journal.pone.0225317. eCollection 2019.

HN-PPISP: a hybrid network based on MLP-Mixer for protein-protein interaction site prediction.HN-PPISP：一种基于MLP-Mixer的用于蛋白质-蛋白质相互作用位点预测的混合网络。

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac480.

iDRBP_MMC: Identifying DNA-Binding Proteins and RNA-Binding Proteins Based on Multi-Label Learning Model and Motif-Based Convolutional Neural Network.iDRBP_MMC：基于多标签学习模型和基于模体的卷积神经网络的 DNA 结合蛋白和 RNA 结合蛋白的鉴定。

J Mol Biol. 2020 Nov 6;432(22):5860-5875. doi: 10.1016/j.jmb.2020.09.008. Epub 2020 Sep 11.

PredDRBP-MLP: Prediction of DNA-binding proteins and RNA-binding proteins by multilayer perceptron.PredDRBP-MLP：通过多层感知器预测DNA结合蛋白和RNA结合蛋白

Comput Biol Med. 2023 Sep;164:107317. doi: 10.1016/j.compbiomed.2023.107317. Epub 2023 Aug 7.

PreDBP-PLMs: Prediction of DNA-binding proteins based on pre-trained protein language models and convolutional neural networks.PreDBP-PLMs：基于预训练蛋白质语言模型和卷积神经网络的DNA结合蛋白预测

Anal Biochem. 2024 Nov;694:115603. doi: 10.1016/j.ab.2024.115603. Epub 2024 Jul 8.

Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks.使用神经网络预测蛋白质中单核和双核核苷酸特异性DNA结合位点。

BMC Struct Biol. 2009 May 13;9:30. doi: 10.1186/1472-6807-9-30.

引用本文的文献

DRBP-EDP: classification of DNA-binding proteins and RNA-binding proteins using ESM-2 and dual-path neural network.DRBP-EDP：使用ESM-2和双路径神经网络对DNA结合蛋白和RNA结合蛋白进行分类

NAR Genom Bioinform. 2025 May 19;7(2):lqaf058. doi: 10.1093/nargab/lqaf058. eCollection 2025 Jun.

Prediction of the Trimer Protein Interface Residue Pair by CNN-GRU Model Based on Multi-Feature Map.基于多特征图的CNN-GRU模型预测三聚体蛋白界面残基对

Nanomaterials (Basel). 2025 Jan 24;15(3):188. doi: 10.3390/nano15030188.

Systematic discovery of DNA-binding tandem repeat proteins.DNA 结合串联重复蛋白的系统发现。

Nucleic Acids Res. 2024 Sep 23;52(17):10464-10489. doi: 10.1093/nar/gkae710.

Predictive modeling of moonlighting DNA-binding proteins.兼职DNA结合蛋白的预测建模

NAR Genom Bioinform. 2022 Dec 2;4(4):lqac091. doi: 10.1093/nargab/lqac091. eCollection 2022 Dec.

BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA-miRNA interaction prediction.BoT-Net：一种基于轻量级技巧的神经网络，用于高效的 LncRNA-miRNA 相互作用预测。

Interdiscip Sci. 2022 Dec;14(4):841-862. doi: 10.1007/s12539-022-00535-x. Epub 2022 Aug 10.

Prediction of Transcription Factor Binding Sites Using a Combined Deep Learning Approach.使用组合深度学习方法预测转录因子结合位点

Front Oncol. 2022 Jun 3;12:893520. doi: 10.3389/fonc.2022.893520. eCollection 2022.

Research on DNA-Binding Protein Identification Method Based on LSTM-CNN Feature Fusion.基于 LSTM-CNN 特征融合的 DNA 结合蛋白识别方法研究。

Comput Math Methods Med. 2022 Jun 2;2022:9705275. doi: 10.1155/2022/9705275. eCollection 2022.

PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method.PredDBP-Stack：基于堆叠集成方法的使用 HMM 轮廓预测 DNA 结合蛋白

Biomed Res Int. 2020 Apr 13;2020:7297631. doi: 10.1155/2020/7297631. eCollection 2020.

HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection.HMMPred：基于 HMM 轮廓和 XGBoost 特征选择的 DNA 结合蛋白精确预测。

Comput Math Methods Med. 2020 Mar 28;2020:1384749. doi: 10.1155/2020/1384749. eCollection 2020.

Computational Identification and Analysis of Ubiquinone-Binding Proteins.计算鉴定和分析泛醌结合蛋白。

Cells. 2020 Feb 24;9(2):520. doi: 10.3390/cells9020520.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于全进化谱的深度卷积神经网络从序列预测 DNA 结合蛋白。

Enabling full-length evolutionary profiles based deep convolutional neural network for predicting DNA-binding proteins from sequence.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献