iDNA-Prot|dis：通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。

iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.

作者信息

Liu Bin, Xu Jinghao, Lan Xun, Xu Ruifeng, Zhou Jiyun, Wang Xiaolong, Chou Kuo-Chen

机构信息

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China; Gordon Life Science Institute, Belmont, Massachusetts, United States of America.

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.

出版信息

PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.

DOI:10.1371/journal.pone.0106691

PMID:25184541

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4153653/

Abstract

Playing crucial roles in various cellular processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression, DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes. With the avalanche of protein sequences generated in the postgenomic age, it is a critical challenge to develop automated methods for accurate and rapidly identifying DNA-binding proteins based on their sequence information alone. Here, a novel predictor, called "iDNA-Prot|dis", was established by incorporating the amino acid distance-pair coupling information and the amino acid reduced alphabet profile into the general pseudo amino acid composition (PseAAC) vector. The former can capture the characteristics of DNA-binding proteins so as to enhance its prediction quality, while the latter can reduce the dimension of PseAAC vector so as to speed up its prediction process. It was observed by the rigorous jackknife and independent dataset tests that the new predictor outperformed the existing predictors for the same purpose. As a user-friendly web-server, iDNA-Prot|dis is accessible to the public at http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step protocol guide is provided on how to use the web-server to get their desired results without the need to follow the complicated mathematic equations that are presented in this paper just for the integrity of its developing process. It is anticipated that the iDNA-Prot|dis predictor may become a useful high throughput tool for large-scale analysis of DNA-binding proteins, or at the very least, play a complementary role to the existing predictors in this regard.

摘要

DNA结合蛋白在各种细胞过程中发挥着关键作用，如识别特定核苷酸序列、转录调控和基因表达调控，是真核生物和原核生物蛋白质组的重要组成部分。在后基因组时代，随着蛋白质序列的大量涌现，仅基于序列信息开发准确、快速识别DNA结合蛋白的自动化方法是一项严峻挑战。在此，通过将氨基酸距离对耦合信息和氨基酸简约字母特征纳入通用伪氨基酸组成（PseAAC）向量，建立了一种名为“iDNA-Prot|dis”的新型预测器。前者可捕捉DNA结合蛋白的特征以提高其预测质量，而后者可降低PseAAC向量的维度以加速其预测过程。通过严格的留一法和独立数据集测试观察到，新预测器在相同目的上优于现有预测器。作为一个用户友好的网络服务器，可通过http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/ 向公众开放使用iDNA-Prot|dis。此外，为方便绝大多数实验科学家，提供了一份逐步操作指南，介绍如何使用网络服务器获得所需结果，而无需遵循本文中仅为其开发过程完整性而呈现的复杂数学方程。预计iDNA-Prot|dis预测器可能成为大规模分析DNA结合蛋白的有用高通量工具，或者至少在这方面对现有预测器起到补充作用。

相似文献

iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.

PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.

iDNA-Prot: identification of DNA binding proteins using random forest with grey model.

PLoS One. 2011;6(9):e24756. doi: 10.1371/journal.pone.0024756. Epub 2011 Sep 15.

iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition.

Anal Biochem. 2013 Nov 1;442(1):118-25. doi: 10.1016/j.ab.2013.05.024. Epub 2013 Jun 10.

iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition.

Anal Biochem. 2015 Apr 1;474:69-77. doi: 10.1016/j.ab.2014.12.009. Epub 2015 Jan 14.

DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation.

Sci Rep. 2015 Oct 20;5:15479. doi: 10.1038/srep15479.

enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.

Biomed Res Int. 2014;2014:294279. doi: 10.1155/2014/294279. Epub 2014 May 26.

iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach.

J Biomol Struct Dyn. 2016;34(1):223-35. doi: 10.1080/07391102.2015.1014422. Epub 2015 Mar 3.

iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition.

PLoS One. 2013;8(2):e55844. doi: 10.1371/journal.pone.0055844. Epub 2013 Feb 7.

Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model.

PLoS One. 2012;7(11):e49040. doi: 10.1371/journal.pone.0049040. Epub 2012 Nov 26.

iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition.

Int J Mol Sci. 2014 May 5;15(5):7594-610. doi: 10.3390/ijms15057594.

引用本文的文献

MvAl-MFP: A Multi-Label Classification Method on the Functions of Peptides with Multi-View Active Learning.

Curr Issues Mol Biol. 2025 Aug 6;47(8):628. doi: 10.3390/cimb47080628.

A Comprehensive Review on RNA Subcellular Localization Prediction.

ArXiv. 2025 Apr 24:arXiv:2504.17162v1.

TransBind allows precise detection of DNA-binding proteins and residues using language models and deep learning.

Commun Biol. 2025 Apr 5;8(1):568. doi: 10.1038/s42003-025-07534-w.

Benchmarking recent computational tools for DNA-binding protein identification.

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae634.

iMFP-LG: Identify Novel Multi-functional Peptides Using Protein Language Models and Graph-based Deep Learning.

Genomics Proteomics Bioinformatics. 2025 Jan 15;22(6). doi: 10.1093/gpbjnl/qzae084.

Systematic discovery of DNA-binding tandem repeat proteins.

Nucleic Acids Res. 2024 Sep 23;52(17):10464-10489. doi: 10.1093/nar/gkae710.

AMP-RNNpro: a two-stage approach for identification of antimicrobials using probabilistic features.

Sci Rep. 2024 Jun 5;14(1):12892. doi: 10.1038/s41598-024-63461-6.

ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.

Protein Sci. 2024 Jun;33(6):e5015. doi: 10.1002/pro.5015.

Protein feature engineering framework for AMPylation site prediction.

Sci Rep. 2024 Apr 15;14(1):8695. doi: 10.1038/s41598-024-58450-8.

StackDPP: a stacking ensemble based DNA-binding protein prediction model.

BMC Bioinformatics. 2024 Mar 14;25(1):111. doi: 10.1186/s12859-024-05714-9.

本文引用的文献

Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation.

Mol Inform. 2013 Oct;32(9-10):775-82. doi: 10.1002/minf.201300084. Epub 2013 Jul 24.

iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition.

PLoS One. 2014 Aug 14;9(8):e105018. doi: 10.1371/journal.pone.0105018. eCollection 2014.

iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition.

Anal Biochem. 2014 Oct 1;462:76-83. doi: 10.1016/j.ab.2014.06.022. Epub 2014 Jul 10.

Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine.

Comput Methods Programs Biomed. 2014 Oct;116(3):184-92. doi: 10.1016/j.cmpb.2014.06.007. Epub 2014 Jun 21.

iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels.

Biomed Res Int. 2014;2014:286419. doi: 10.1155/2014/286419. Epub 2014 Jun 1.

iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach.

Biomed Res Int. 2014;2014:947416. doi: 10.1155/2014/947416. Epub 2014 May 22.

iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition.

Biomed Res Int. 2014;2014:623149. doi: 10.1155/2014/623149. Epub 2014 May 21.

iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition.

Int J Mol Sci. 2014 May 5;15(5):7594-610. doi: 10.3390/ijms15057594.

Chou's pseudo amino acid composition improves sequence-based antifreeze protein prediction.

J Theor Biol. 2014 Sep 7;356:30-5. doi: 10.1016/j.jtbi.2014.04.006. Epub 2014 Apr 13.

PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition.

Anal Biochem. 2014 Jul 1;456:53-60. doi: 10.1016/j.ab.2014.04.001. Epub 2014 Apr 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

iDNA-Prot|dis：通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。

iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献