• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于序列信息预测蛋白质-DNA 结合界面的热点残基。

Predicting Hot Spot Residues at Protein-DNA Binding Interfaces Based on Sequence Information.

机构信息

Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.

School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, China.

出版信息

Interdiscip Sci. 2021 Mar;13(1):1-11. doi: 10.1007/s12539-020-00399-z. Epub 2020 Oct 17.

DOI:10.1007/s12539-020-00399-z
PMID:33068261
Abstract

Hot spot residues at protein-DNA binding interfaces are hugely important for investigating the underlying mechanism of molecular recognition. Currently, there are a few tools available for identifying the hot spot residues in the protein-DNA complexes. In addition, the three-dimensional protein structures are needed in these tools. However, it is well known that the three-dimensional structures are unavailable for most proteins. Considering the limitation, we proposed a method, named SPDH, for predicting hot spot residues only based on protein sequences. Firstly, we obtained 133 features from physicochemical property, conservation, predicted solvent accessible surface area and structure. Then, we systematically assessed these features based on various feature selection methods to obtain the optimal feature subset and compared the models using four classical machine learning algorithms (support vector machine, random forest, logistic regression, and k-nearest neighbor) on the training dataset. We found that the variability of physicochemical property features between wild and mutative types was important on improving the performance of the prediction model. On the independent test set, our method achieved the performance with AUC of 0.760 and sensitivity of 0.808, and outperformed other methods. The data and source code can be downloaded at https://github.com/xialab-ahu/SPDH .

摘要

蛋白质- DNA 结合界面上的热点残基对于研究分子识别的基本机制非常重要。目前,有一些工具可用于识别蛋白质-DNA 复合物中的热点残基。此外,这些工具还需要三维蛋白质结构。然而,众所周知,大多数蛋白质的三维结构是不可用的。考虑到这一限制,我们提出了一种仅基于蛋白质序列预测热点残基的方法,命名为 SPDH。首先,我们从理化性质、保守性、预测溶剂可及表面积和结构中获得了 133 个特征。然后,我们基于各种特征选择方法系统地评估了这些特征,以获得最优的特征子集,并在训练数据集上使用四种经典机器学习算法(支持向量机、随机森林、逻辑回归和 k-最近邻)比较了模型。我们发现,野生型和突变型之间理化性质特征的可变性对于提高预测模型的性能很重要。在独立测试集上,我们的方法的 AUC 为 0.760,敏感性为 0.808,优于其他方法。数据和源代码可以在 https://github.com/xialab-ahu/SPDH 上下载。

相似文献

1
Predicting Hot Spot Residues at Protein-DNA Binding Interfaces Based on Sequence Information.基于序列信息预测蛋白质-DNA 结合界面的热点残基。
Interdiscip Sci. 2021 Mar;13(1):1-11. doi: 10.1007/s12539-020-00399-z. Epub 2020 Oct 17.
2
A feature-based approach to predict hot spots in protein-DNA binding interfaces.基于特征的方法预测蛋白质-DNA 结合界面热点。
Brief Bioinform. 2020 May 21;21(3):1038-1046. doi: 10.1093/bib/bbz037.
3
An improved DNA-binding hot spot residues prediction method by exploring interfacial neighbor properties.一种通过探索界面邻居性质来改进 DNA 结合热点残基预测方法。
BMC Bioinformatics. 2021 May 17;22(Suppl 3):253. doi: 10.1186/s12859-020-03871-1.
4
Protein-protein interface hot spots prediction based on a hybrid feature selection strategy.基于混合特征选择策略的蛋白质-蛋白质界面热点预测。
BMC Bioinformatics. 2018 Jan 15;19(1):14. doi: 10.1186/s12859-018-2009-5.
5
Prediction of hot spots in protein-DNA binding interfaces based on discrete wavelet transform and wavelet packet transform.基于离散小波变换和小波包变换的蛋白质-DNA 结合界面热点预测。
BMC Bioinformatics. 2023 Apr 4;24(1):129. doi: 10.1186/s12859-023-05263-7.
6
APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility.APIS:通过结合突出指数和溶剂可及性来准确预测蛋白质界面热点。
BMC Bioinformatics. 2010 Apr 8;11:174. doi: 10.1186/1471-2105-11-174.
7
Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences.通过氨基酸序列的理化特性准确预测热点残基。
Proteins. 2013 Aug;81(8):1351-62. doi: 10.1002/prot.24278.
8
Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting.利用极端梯度提升增强蛋白质-蛋白质界面热点预测。
Sci Rep. 2018 Sep 24;8(1):14285. doi: 10.1038/s41598-018-32511-1.
9
Computationally identifying hot spots in protein-DNA binding interfaces using an ensemble approach.使用集成方法计算蛋白质-DNA 结合界面中的热点。
BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):384. doi: 10.1186/s12859-020-03675-3.
10
XGBPRH: Prediction of Binding Hot Spots at Protein⁻RNA Interfaces Utilizing Extreme Gradient Boosting.XGBPRH:利用极端梯度提升预测蛋白质⁻RNA 界面的结合热点。
Genes (Basel). 2019 Mar 21;10(3):242. doi: 10.3390/genes10030242.

引用本文的文献

1
Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.蛋白质序列中核酸结合残基预测二十年进展
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf016.
2
Prediction of Protein-DNA Interface Hot Spots Based on Empirical Mode Decomposition and Machine Learning.基于经验模态分解和机器学习的蛋白质- DNA 界面热点预测。
Genes (Basel). 2024 May 23;15(6):676. doi: 10.3390/genes15060676.
3
Prediction of hot spots in protein-DNA binding interfaces based on discrete wavelet transform and wavelet packet transform.

本文引用的文献

1
Pyfastx: a robust Python package for fast random access to sequences from plain and gzipped FASTA/Q files.Pyfastx:一个强大的 Python 包,用于快速随机访问来自普通和 gzipped FASTA/Q 文件的序列。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa368.
2
Comparison and integration of computational methods for deleterious synonymous mutation prediction.有害同义突变预测的计算方法比较与整合。
Brief Bioinform. 2020 May 21;21(3):970-981. doi: 10.1093/bib/bbz047.
3
Computational identification of deleterious synonymous variants in human genomes using a feature-based approach.
基于离散小波变换和小波包变换的蛋白质-DNA 结合界面热点预测。
BMC Bioinformatics. 2023 Apr 4;24(1):129. doi: 10.1186/s12859-023-05263-7.
基于特征的方法计算人类基因组中有害同义变体的识别。
BMC Med Genomics. 2019 Jan 31;12(Suppl 1):12. doi: 10.1186/s12920-018-0455-6.
4
PremPDI estimates and interprets the effects of missense mutations on protein-DNA interactions.PremPDI 估计并解释错义突变对蛋白质-DNA 相互作用的影响。
PLoS Comput Biol. 2018 Dec 11;14(12):e1006615. doi: 10.1371/journal.pcbi.1006615. eCollection 2018 Dec.
5
PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method.PredT4SE-Stack:使用堆叠集成方法从蛋白质序列预测细菌IV型分泌效应蛋白
Front Microbiol. 2018 Oct 26;9:2571. doi: 10.3389/fmicb.2018.02571. eCollection 2018.
6
dbCID: a manually curated resource for exploring the driver indels in human cancer.dbCID:一个用于探索人类癌症中驱动因子插入缺失的人工 curated 资源。
Brief Bioinform. 2019 Sep 27;20(5):1925-1933. doi: 10.1093/bib/bby059.
7
Computational identification of binding energy hot spots in protein-RNA complexes using an ensemble approach.使用集成方法计算蛋白质 - RNA 复合物中结合能热点。
Bioinformatics. 2018 May 1;34(9):1473-1480. doi: 10.1093/bioinformatics/btx822.
8
Predicting protein-DNA binding free energy change upon missense mutations using modified MM/PBSA approach: SAMPDI webserver.使用改良的 MM/PBSA 方法预测错义突变对蛋白质-DNA 结合自由能变化:SAMPDI 网络服务器。
Bioinformatics. 2018 Mar 1;34(5):779-786. doi: 10.1093/bioinformatics/btx698.
9
Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility.利用长短期记忆双向递归神经网络捕捉非局部相互作用,提高蛋白质二级结构、主链角度、接触数和溶剂可及性的预测能力。
Bioinformatics. 2017 Sep 15;33(18):2842-2849. doi: 10.1093/bioinformatics/btx218.
10
mCSM-NA: predicting the effects of mutations on protein-nucleic acids interactions.mCSM-NA:预测突变对蛋白质-核酸相互作用的影响。
Nucleic Acids Res. 2017 Jul 3;45(W1):W241-W246. doi: 10.1093/nar/gkx236.