• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于位置特异性得分矩阵的蛋白质中DNA结合位点预测

PSSM-based prediction of DNA binding sites in proteins.

作者信息

Ahmad Shandar, Sarai Akinori

机构信息

Department of Bioinformatics and Bioscience, Kyushu Institute of Technology, Iizuka 820 8502, Fukuoka, Japan.

出版信息

BMC Bioinformatics. 2005 Feb 19;6:33. doi: 10.1186/1471-2105-6-33.

DOI:10.1186/1471-2105-6-33
PMID:15720719
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC550660/
Abstract

BACKGROUND

Detection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation. We have previously shown that a residue and its sequence neighbor information can be used to predict DNA-binding candidates in a protein sequence. This sequence-based prediction method is applicable even if no sequence homology with a previously known DNA-binding protein is observed. Here we implement a neural network based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a better prediction of DNA-binding sites.

RESULTS

An average of sensitivity and specificity using PSSMs is up to 8.7% better than the prediction with sequence information only. Much smaller data sets could be used to generate PSSM with minimal loss of prediction accuracy.

CONCLUSION

One problem in using PSSM-derived prediction is obtaining lengthy and time-consuming alignments against large sequence databases. In order to speed up the process of generating PSSMs, we tried to use different reference data sets (sequence space) against which a target protein is scanned for PSI-BLAST iterations. We find that a very small set of proteins can actually be used as such a reference data without losing much of the prediction value. This makes the process of generating PSSMs very rapid and even amenable to be used at a genome level. A web server has been developed to provide these predictions of DNA-binding sites for any new protein from its amino acid sequence.

AVAILABILITY

Online predictions based on this method are available at http://www.netasa.org/dbs-pssm/

摘要

背景

对于旨在进行基因调控和操纵的技术而言,检测蛋白质中的DNA结合位点极具意义。我们之前已经表明,一个残基及其序列邻域信息可用于预测蛋白质序列中的DNA结合候选位点。即使未观察到与先前已知的DNA结合蛋白的序列同源性,这种基于序列的预测方法也适用。在此,我们实现了一种基于神经网络的算法,以利用氨基酸序列的进化信息(根据其位置特异性得分矩阵,即PSSM)来更好地预测DNA结合位点。

结果

使用PSSM的敏感性和特异性平均比仅使用序列信息的预测提高了8.7%。可以使用小得多的数据集来生成PSSM,而预测准确性的损失最小。

结论

使用源自PSSM的预测存在的一个问题是,针对大型序列数据库进行比对既冗长又耗时。为了加快生成PSSM的过程,我们尝试使用不同的参考数据集(序列空间),针对这些数据集对目标蛋白进行PSI-BLAST迭代扫描。我们发现,实际上可以使用非常小的一组蛋白质作为这样的参考数据,而不会损失太多预测价值。这使得生成PSSM的过程非常迅速,甚至适用于在基因组水平上使用。我们已经开发了一个网络服务器,可根据任何新蛋白质的氨基酸序列提供这些DNA结合位点的预测。

可用性

基于此方法的在线预测可在http://www.netasa.org/dbs-pssm/获得

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4fc/550660/d729d0419b94/1471-2105-6-33-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4fc/550660/94a6dec037b9/1471-2105-6-33-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4fc/550660/d729d0419b94/1471-2105-6-33-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4fc/550660/94a6dec037b9/1471-2105-6-33-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4fc/550660/d729d0419b94/1471-2105-6-33-2.jpg

相似文献

1
PSSM-based prediction of DNA binding sites in proteins.基于位置特异性得分矩阵的蛋白质中DNA结合位点预测
BMC Bioinformatics. 2005 Feb 19;6:33. doi: 10.1186/1471-2105-6-33.
2
A neural network method for prediction of beta-turn types in proteins using evolutionary information.一种利用进化信息预测蛋白质中β-转角类型的神经网络方法。
Bioinformatics. 2004 Nov 1;20(16):2751-8. doi: 10.1093/bioinformatics/bth322. Epub 2004 May 14.
3
Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method.使用混合支持向量机-位置特异性打分矩阵(SVM-PSSM)方法设计蛋白质中DNA结合位点的精确预测器。
Biosystems. 2007 Jul-Aug;90(1):234-41. doi: 10.1016/j.biosystems.2006.08.007. Epub 2006 Aug 23.
4
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
5
DP-Bind: a web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins.DP-Bind:一个用于基于序列预测DNA结合蛋白中DNA结合残基的网络服务器。
Bioinformatics. 2007 Mar 1;23(5):634-6. doi: 10.1093/bioinformatics/btl672. Epub 2007 Jan 19.
6
SVM based prediction of RNA-binding proteins using binding residues and evolutionary information.基于支持向量机的 RNA 结合蛋白结合残基和进化信息预测。
J Mol Recognit. 2011 Mar-Apr;24(2):303-13. doi: 10.1002/jmr.1061.
7
Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information.基于组成、序列和结构信息对DNA结合蛋白及其结合残基进行分析和预测。
Bioinformatics. 2004 Mar 1;20(4):477-86. doi: 10.1093/bioinformatics/btg432. Epub 2004 Jan 22.
8
Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein.基于蛋白质的氨基酸和二肽组成对基因表达水平进行相关性分析与预测。
BMC Bioinformatics. 2005 Mar 17;6:59. doi: 10.1186/1471-2105-6-59.
9
A comparison of position-specific score matrices based on sequence and structure alignments.基于序列和结构比对的特定位置得分矩阵比较。
Protein Sci. 2002 Feb;11(2):361-70. doi: 10.1110/ps.19902.
10
GANN: genetic algorithm neural networks for the detection of conserved combinations of features in DNA.GANN:用于检测DNA中特征保守组合的遗传算法神经网络。
BMC Bioinformatics. 2005 Feb 22;6:36. doi: 10.1186/1471-2105-6-36.

引用本文的文献

1
MKFGO: integrating multi-source knowledge fusion with pretrained language model for high-accuracy protein function prediction.MKFGO:将多源知识融合与预训练语言模型相结合用于高精度蛋白质功能预测
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf420.
2
Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.蛋白质序列中核酸结合残基预测二十年进展
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf016.
3
AttABseq: an attention-based deep learning prediction method for antigen-antibody binding affinity changes based on protein sequences.

本文引用的文献

1
Moment-based prediction of DNA-binding proteins.基于矩的DNA结合蛋白预测。
J Mol Biol. 2004 Jul 30;341(1):65-71. doi: 10.1016/j.jmb.2004.05.058.
2
Accurate prediction of solvent accessibility using neural networks-based regression.使用基于神经网络的回归准确预测溶剂可及性。
Proteins. 2004 Sep 1;56(4):753-67. doi: 10.1002/prot.20176.
3
Protein sequence databases.蛋白质序列数据库。
AttABseq:一种基于注意力的深度学习预测方法,用于预测基于蛋白质序列的抗原-抗体结合亲和力变化。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae304.
4
A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond.蛋白质中心预测因子在生物分子相互作用研究中的综合综述:从蛋白质到核酸及其他。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae162.
5
DRBpred: A sequence-based machine learning method to effectively predict DNA- and RNA-binding residues.DRBpred:一种基于序列的机器学习方法,可有效预测 DNA 和 RNA 结合残基。
Comput Biol Med. 2024 Mar;170:108081. doi: 10.1016/j.compbiomed.2024.108081. Epub 2024 Jan 29.
6
HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins.HybridDBRpred:利用结构复合物和无序蛋白的注释改进基于序列的 DNA 结合氨基酸预测。
Nucleic Acids Res. 2024 Jan 25;52(2):e10. doi: 10.1093/nar/gkad1131.
7
A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins.一种基于序列的进化距离方法,用于高度分化蛋白的系统发育分析。
Sci Rep. 2023 Nov 20;13(1):20304. doi: 10.1038/s41598-023-47496-9.
8
Predictive modeling of moonlighting DNA-binding proteins.兼职DNA结合蛋白的预测建模
NAR Genom Bioinform. 2022 Dec 2;4(4):lqac091. doi: 10.1093/nargab/lqac091. eCollection 2022 Dec.
9
Host-pathogen protein-nucleic acid interactions: A comprehensive review.宿主-病原体蛋白-核酸相互作用:全面综述。
Comput Struct Biotechnol J. 2022 Aug 4;20:4415-4436. doi: 10.1016/j.csbj.2022.08.001. eCollection 2022.
10
PSSMCOOL: a comprehensive R package for generating evolutionary-based descriptors of protein sequences from PSSM profiles.PSSMCOOL:一个用于从PSSM谱生成基于进化的蛋白质序列描述符的综合R包。
Biol Methods Protoc. 2022 Mar 30;7(1):bpac008. doi: 10.1093/biomethods/bpac008. eCollection 2022.
Curr Opin Chem Biol. 2004 Feb;8(1):76-80. doi: 10.1016/j.cbpa.2003.12.004.
4
Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information.基于组成、序列和结构信息对DNA结合蛋白及其结合残基进行分析和预测。
Bioinformatics. 2004 Mar 1;20(4):477-86. doi: 10.1093/bioinformatics/btg432. Epub 2004 Jan 22.
5
Annotating nucleic acid-binding function based on protein structure.基于蛋白质结构注释核酸结合功能。
J Mol Biol. 2003 Feb 28;326(4):1065-79. doi: 10.1016/s0022-2836(03)00031-7.
6
Specificity of protein-DNA recognition revealed by structure-based potentials: symmetric/asymmetric and cognate/non-cognate binding.基于结构的势能揭示的蛋白质 - DNA 识别特异性:对称/不对称及同源/非同源结合
J Mol Biol. 2002 Oct 4;322(5):907-15. doi: 10.1016/s0022-2836(02)00846-x.
7
Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity.蛋白质与DNA的相互作用:氨基酸保守性及突变对结合特异性的影响
J Mol Biol. 2002 Jul 26;320(5):991-1009. doi: 10.1016/s0022-2836(02)00571-5.
8
Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition?蛋白质 - DNA 界面的几何分析与比较:为何不存在简单的识别密码?
J Mol Biol. 2000 Aug 18;301(3):597-624. doi: 10.1006/jmbi.2000.3918.
9
Application of multiple sequence alignment profiles to improve protein secondary structure prediction.应用多序列比对轮廓来改进蛋白质二级结构预测。
Proteins. 2000 Aug 15;40(3):502-11. doi: 10.1002/1097-0134(20000815)40:3<502::aid-prot170>3.0.co;2-q.
10
The Protein Data Bank.蛋白质数据库。
Nucleic Acids Res. 2000 Jan 1;28(1):235-42. doi: 10.1093/nar/28.1.235.