使用 HMM 轮廓预测单链和双链 DNA 结合蛋白。

Single-stranded and double-stranded DNA-binding protein prediction using HMM profiles.

机构信息

School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji.

Laboratory of Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan; Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan; Laboratory of Medical Science Mathematics, Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, 113-0033, Japan.

出版信息

Anal Biochem. 2021 Jan 1;612:113954. doi: 10.1016/j.ab.2020.113954. Epub 2020 Sep 15.

DOI:10.1016/j.ab.2020.113954

PMID:32946833

Abstract

BACKGROUND

DNA-binding proteins perform important roles in cellular processes and are involved in many biological activities. These proteins include crucial protein-DNA binding domains and can interact with single-stranded or double-stranded DNA, and accordingly classified as single-stranded DNA-binding proteins (SSBs) or double-stranded DNA-binding proteins (DSBs). Computational prediction of SSBs and DSBs helps in annotating protein functions and understanding of protein-binding domains.

RESULTS

Performance is reported using the DNA-binding protein dataset that was recently introduced by Wang et al., [1]. The proposed method achieved a sensitivity of 0.600, specificity of 0.792, AUC of 0.758, MCC of 0.369, accuracy of 0.744, and F-measure of 0.536, on the independent test set.

CONCLUSION

The proposed method with the hidden Markov model (HMM) profiles for feature extraction, outperformed the benchmark method in the literature and achieved an overall improvement of approximately 3%. The source code and supplementary information of the proposed method is available at https://github.com/roneshsharma/Predict-DNA-binding-proteins/wiki.

摘要

背景

DNA 结合蛋白在细胞过程中发挥着重要作用，参与许多生物活性。这些蛋白质包括关键的蛋白-DNA 结合域，可与单链或双链 DNA 相互作用，因此分为单链 DNA 结合蛋白 (SSB) 或双链 DNA 结合蛋白 (DSB)。SSB 和 DSB 的计算预测有助于注释蛋白质功能和理解蛋白质结合域。

结果

使用 Wang 等人最近提出的 DNA 结合蛋白数据集报告性能[1]。在所提出的方法中，在独立测试集中，灵敏度为 0.600，特异性为 0.792，AUC 为 0.758，MCC 为 0.369，准确性为 0.744，F1 分数为 0.536。

结论

在所提出的方法中，使用隐马尔可夫模型 (HMM) 进行特征提取，优于文献中的基准方法，并实现了约 3%的整体改进。该方法的源代码和补充信息可在 https://github.com/roneshsharma/Predict-DNA-binding-proteins/wiki 上获得。

相似文献

Single-stranded and double-stranded DNA-binding protein prediction using HMM profiles.使用 HMM 轮廓预测单链和双链 DNA 结合蛋白。

Anal Biochem. 2021 Jan 1;612:113954. doi: 10.1016/j.ab.2020.113954. Epub 2020 Sep 15.

Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences.基于蛋白质序列的单链和双链DNA结合蛋白分析与预测

BMC Bioinformatics. 2017 Jun 12;18(1):300. doi: 10.1186/s12859-017-1715-8.

PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction.PredPSD：一种用于单链和双链 DNA 结合蛋白预测的梯度提升树方法。

Molecules. 2019 Dec 26;25(1):98. doi: 10.3390/molecules25010098.

SDBP-Pred: Prediction of single-stranded and double-stranded DNA-binding proteins by extending consensus sequence and K-segmentation strategies into PSSM.SDBP-Pred：通过将共识序列和 K 分割策略扩展到 PSSM 中，预测单链和双链 DNA 结合蛋白。

Anal Biochem. 2020 Jan 15;589:113494. doi: 10.1016/j.ab.2019.113494. Epub 2019 Nov 3.

Identification of single-stranded and double-stranded DNA binding proteins based on protein structure.基于蛋白质结构鉴定单链和双链 DNA 结合蛋白。

BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S4. doi: 10.1186/1471-2105-15-S12-S4. Epub 2014 Nov 6.

DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.DPP-PseAAC：一种基于 Chou 的通用 PseAAC 的 DNA 结合蛋白预测模型。

J Theor Biol. 2018 Sep 7;452:22-34. doi: 10.1016/j.jtbi.2018.05.006. Epub 2018 May 16.

Surface shapes and surrounding environment analysis of single- and double-stranded DNA-binding proteins in protein-DNA interface.蛋白质-DNA界面中单链和双链DNA结合蛋白的表面形状及周围环境分析

Proteins. 2016 Jul;84(7):979-89. doi: 10.1002/prot.25045. Epub 2016 Apr 16.

Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.基于序列的具有保守性和相关性信息的蛋白质 DNA 结合残基预测。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1766-75. doi: 10.1109/TCBB.2012.106.

Analysis and classification of DNA-binding sites in single-stranded and double-stranded DNA-binding proteins using protein information.利用蛋白质信息对单链和双链DNA结合蛋白中的DNA结合位点进行分析和分类。

IET Syst Biol. 2014 Aug;8(4):176-83. doi: 10.1049/iet-syb.2013.0048.

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.通过结合支持向量机和位置特异性得分矩阵距离变换来识别DNA结合蛋白。

BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1752-0509-9-S1-S10. Epub 2015 Feb 6.

引用本文的文献

TransBind allows precise detection of DNA-binding proteins and residues using language models and deep learning.TransBind可利用语言模型和深度学习精确检测DNA结合蛋白和残基。

Commun Biol. 2025 Apr 5;8(1):568. doi: 10.1038/s42003-025-07534-w.

Accurate prediction of nucleic acid binding proteins using protein language model.使用蛋白质语言模型准确预测核酸结合蛋白。

Bioinform Adv. 2025 Jan 20;5(1):vbaf008. doi: 10.1093/bioadv/vbaf008. eCollection 2025.

Improved prediction of DNA and RNA binding proteins with deep learning models.深度学习模型提高 DNA 和 RNA 结合蛋白的预测能力。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae285.

Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches.单链 DNA 结合蛋白及其基于机器学习的鉴定方法。

Biomolecules. 2022 Aug 26;12(9):1187. doi: 10.3390/biom12091187.

DeepFeature: feature selection in nonimage data using convolutional neural network.DeepFeature：使用卷积神经网络进行非图像数据的特征选择。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab297.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用 HMM 轮廓预测单链和双链 DNA 结合蛋白。

Single-stranded and double-stranded DNA-binding protein prediction using HMM profiles.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献