• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

iDNA-Prot|dis:通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。

iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.

作者信息

Liu Bin, Xu Jinghao, Lan Xun, Xu Ruifeng, Zhou Jiyun, Wang Xiaolong, Chou Kuo-Chen

机构信息

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China; Gordon Life Science Institute, Belmont, Massachusetts, United States of America.

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.

出版信息

PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.

DOI:10.1371/journal.pone.0106691
PMID:25184541
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4153653/
Abstract

Playing crucial roles in various cellular processes, such as recognition of specific nucleotide sequences, regulation of transcription, and regulation of gene expression, DNA-binding proteins are essential ingredients for both eukaryotic and prokaryotic proteomes. With the avalanche of protein sequences generated in the postgenomic age, it is a critical challenge to develop automated methods for accurate and rapidly identifying DNA-binding proteins based on their sequence information alone. Here, a novel predictor, called "iDNA-Prot|dis", was established by incorporating the amino acid distance-pair coupling information and the amino acid reduced alphabet profile into the general pseudo amino acid composition (PseAAC) vector. The former can capture the characteristics of DNA-binding proteins so as to enhance its prediction quality, while the latter can reduce the dimension of PseAAC vector so as to speed up its prediction process. It was observed by the rigorous jackknife and independent dataset tests that the new predictor outperformed the existing predictors for the same purpose. As a user-friendly web-server, iDNA-Prot|dis is accessible to the public at http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step protocol guide is provided on how to use the web-server to get their desired results without the need to follow the complicated mathematic equations that are presented in this paper just for the integrity of its developing process. It is anticipated that the iDNA-Prot|dis predictor may become a useful high throughput tool for large-scale analysis of DNA-binding proteins, or at the very least, play a complementary role to the existing predictors in this regard.

摘要

DNA结合蛋白在各种细胞过程中发挥着关键作用,如识别特定核苷酸序列、转录调控和基因表达调控,是真核生物和原核生物蛋白质组的重要组成部分。在后基因组时代,随着蛋白质序列的大量涌现,仅基于序列信息开发准确、快速识别DNA结合蛋白的自动化方法是一项严峻挑战。在此,通过将氨基酸距离对耦合信息和氨基酸简约字母特征纳入通用伪氨基酸组成(PseAAC)向量,建立了一种名为“iDNA-Prot|dis”的新型预测器。前者可捕捉DNA结合蛋白的特征以提高其预测质量,而后者可降低PseAAC向量的维度以加速其预测过程。通过严格的留一法和独立数据集测试观察到,新预测器在相同目的上优于现有预测器。作为一个用户友好的网络服务器,可通过http://bioinformatics.hitsz.edu.cn/iDNA-Prot_dis/ 向公众开放使用iDNA-Prot|dis。此外,为方便绝大多数实验科学家,提供了一份逐步操作指南,介绍如何使用网络服务器获得所需结果,而无需遵循本文中仅为其开发过程完整性而呈现的复杂数学方程。预计iDNA-Prot|dis预测器可能成为大规模分析DNA结合蛋白的有用高通量工具,或者至少在这方面对现有预测器起到补充作用。

相似文献

1
iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.iDNA-Prot|dis:通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。
PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.
2
iDNA-Prot: identification of DNA binding proteins using random forest with grey model.iDNA-Prot:基于随机森林和灰色模型识别 DNA 结合蛋白。
PLoS One. 2011;6(9):e24756. doi: 10.1371/journal.pone.0024756. Epub 2011 Sep 15.
3
iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition.iHSP-PseRAAAC:利用伪简约氨基酸字母组成鉴定热休克蛋白家族。
Anal Biochem. 2013 Nov 1;442(1):118-25. doi: 10.1016/j.ab.2013.05.024. Epub 2013 Jun 10.
4
iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition.iDNA-Methyl:通过伪三核苷酸组成识别DNA甲基化位点。
Anal Biochem. 2015 Apr 1;474:69-77. doi: 10.1016/j.ab.2014.12.009. Epub 2015 Jan 14.
5
DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation.通过结合伪氨基酸组成和基于轮廓的蛋白质表示来鉴定DNA结合蛋白
Sci Rep. 2015 Oct 20;5:15479. doi: 10.1038/srep15479.
6
enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.enDNA-Prot:通过应用集成学习识别DNA结合蛋白。
Biomed Res Int. 2014;2014:294279. doi: 10.1155/2014/294279. Epub 2014 May 26.
7
iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach.iMiRNA-PseDPC:基于伪距离对组合方法的 microRNA 前体识别。
J Biomol Struct Dyn. 2016;34(1):223-35. doi: 10.1080/07391102.2015.1014422. Epub 2015 Mar 3.
8
iSNO-PseAAC: predict cysteine S-nitrosylation sites in proteins by incorporating position specific amino acid propensity into pseudo amino acid composition.iSNO-PseAAC:通过将位置特异性氨基酸倾向纳入伪氨基酸组成来预测蛋白质中的半胱氨酸 S-亚硝酰化位点。
PLoS One. 2013;8(2):e55844. doi: 10.1371/journal.pone.0055844. Epub 2013 Feb 7.
9
Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model.通过将序列进化信息纳入伪氨基酸组成,利用灰色系统模型预测疟原虫分泌蛋白。
PLoS One. 2012;7(11):e49040. doi: 10.1371/journal.pone.0049040. Epub 2012 Nov 26.
10
iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition.iHyd-PseAAC:通过将二肽位置特异性倾向纳入伪氨基酸组成来预测蛋白质中的羟脯氨酸和羟赖氨酸
Int J Mol Sci. 2014 May 5;15(5):7594-610. doi: 10.3390/ijms15057594.

引用本文的文献

1
MvAl-MFP: A Multi-Label Classification Method on the Functions of Peptides with Multi-View Active Learning.MvAl-MFP:一种基于多视图主动学习的肽功能多标签分类方法。
Curr Issues Mol Biol. 2025 Aug 6;47(8):628. doi: 10.3390/cimb47080628.
2
A Comprehensive Review on RNA Subcellular Localization Prediction.RNA亚细胞定位预测综述
ArXiv. 2025 Apr 24:arXiv:2504.17162v1.
3
TransBind allows precise detection of DNA-binding proteins and residues using language models and deep learning.TransBind可利用语言模型和深度学习精确检测DNA结合蛋白和残基。

本文引用的文献

1
Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation.结合周氏伪氨基酸组成和基于轮廓的蛋白质表示法进行蛋白质远程同源性检测。
Mol Inform. 2013 Oct;32(9-10):775-82. doi: 10.1002/minf.201300084. Epub 2013 Jul 24.
2
iNitro-Tyr: prediction of nitrotyrosine sites in proteins with general pseudo amino acid composition.iNitro-Tyr:利用通用伪氨基酸组成预测蛋白质中的硝基酪氨酸位点。
PLoS One. 2014 Aug 14;9(8):e105018. doi: 10.1371/journal.pone.0105018. eCollection 2014.
3
iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition.
Commun Biol. 2025 Apr 5;8(1):568. doi: 10.1038/s42003-025-07534-w.
4
Benchmarking recent computational tools for DNA-binding protein identification.对近期用于DNA结合蛋白识别的计算工具进行基准测试。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae634.
5
iMFP-LG: Identify Novel Multi-functional Peptides Using Protein Language Models and Graph-based Deep Learning.iMFP-LG:使用蛋白质语言模型和基于图的深度学习识别新型多功能肽。
Genomics Proteomics Bioinformatics. 2025 Jan 15;22(6). doi: 10.1093/gpbjnl/qzae084.
6
Systematic discovery of DNA-binding tandem repeat proteins.DNA 结合串联重复蛋白的系统发现。
Nucleic Acids Res. 2024 Sep 23;52(17):10464-10489. doi: 10.1093/nar/gkae710.
7
AMP-RNNpro: a two-stage approach for identification of antimicrobials using probabilistic features.AMP-RNNpro:一种使用概率特征识别抗菌药物的两阶段方法。
Sci Rep. 2024 Jun 5;14(1):12892. doi: 10.1038/s41598-024-63461-6.
8
ProkDBP: Toward more precise identification of prokaryotic DNA binding proteins.ProkDBP:致力于更精确地识别原核 DNA 结合蛋白。
Protein Sci. 2024 Jun;33(6):e5015. doi: 10.1002/pro.5015.
9
Protein feature engineering framework for AMPylation site prediction.蛋白质修饰位点预测的特征工程框架。
Sci Rep. 2024 Apr 15;14(1):8695. doi: 10.1038/s41598-024-58450-8.
10
StackDPP: a stacking ensemble based DNA-binding protein prediction model.StackDPP:一种基于堆叠集成的 DNA 结合蛋白预测模型。
BMC Bioinformatics. 2024 Mar 14;25(1):111. doi: 10.1186/s12859-024-05714-9.
iTIS-PseTNC:一种基于序列的预测工具,利用伪三核苷酸组成来识别人类基因中的翻译起始位点。
Anal Biochem. 2014 Oct 1;462:76-83. doi: 10.1016/j.ab.2014.06.022. Epub 2014 Jul 10.
4
Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine.通过将伪平均化学位移纳入周的广义伪氨基酸组成和支持向量机来区分蛋白质结构类别。
Comput Methods Programs Biomed. 2014 Oct;116(3):184-92. doi: 10.1016/j.cmpb.2014.06.007. Epub 2014 Jun 21.
5
iCTX-type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels.iCTX型:一种基于序列的预测工具,用于识别靶向离子通道的芋螺毒素类型。
Biomed Res Int. 2014;2014:286419. doi: 10.1155/2014/286419. Epub 2014 Jun 1.
6
iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach.iMethyl-PseAAC:通过伪氨基酸组成方法鉴定蛋白质甲基化位点。
Biomed Res Int. 2014;2014:947416. doi: 10.1155/2014/947416. Epub 2014 May 22.
7
iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition.iSS-PseDNC:利用伪二核苷酸组成识别剪接位点。
Biomed Res Int. 2014;2014:623149. doi: 10.1155/2014/623149. Epub 2014 May 21.
8
iHyd-PseAAC: predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition.iHyd-PseAAC:通过将二肽位置特异性倾向纳入伪氨基酸组成来预测蛋白质中的羟脯氨酸和羟赖氨酸
Int J Mol Sci. 2014 May 5;15(5):7594-610. doi: 10.3390/ijms15057594.
9
Chou's pseudo amino acid composition improves sequence-based antifreeze protein prediction.周的伪氨基酸组成改进了基于序列的抗冻蛋白预测。
J Theor Biol. 2014 Sep 7;356:30-5. doi: 10.1016/j.jtbi.2014.04.006. Epub 2014 Apr 13.
10
PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition.PseKNC:一个用于生成伪K元核苷酸组成的灵活网络服务器。
Anal Biochem. 2014 Jul 1;456:53-60. doi: 10.1016/j.ab.2014.04.001. Epub 2014 Apr 13.