EL_PSSM-RT：通过整合集成学习与PSSM关系转换进行DNA结合残基预测

EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation.

作者信息

Zhou Jiyun, Lu Qin, Xu Ruifeng, He Yulan, Wang Hongpeng

机构信息

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong, 518055, China.

Department of Computing, the Hong Kong Polytechnic University, Kowloon, Hong Kong.

出版信息

BMC Bioinformatics. 2017 Aug 29;18(1):379. doi: 10.1186/s12859-017-1792-8.

DOI:10.1186/s12859-017-1792-8

PMID:28851273

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5576297/

Abstract

BACKGROUND

Prediction of DNA-binding residue is important for understanding the protein-DNA recognition mechanism. Many computational methods have been proposed for the prediction, but most of them do not consider the relationships of evolutionary information between residues.

RESULTS

In this paper, we first propose a novel residue encoding method, referred to as the Position Specific Score Matrix (PSSM) Relation Transformation (PSSM-RT), to encode residues by utilizing the relationships of evolutionary information between residues. PDNA-62 and PDNA-224 are used to evaluate PSSM-RT and two existing PSSM encoding methods by five-fold cross-validation. Performance evaluations indicate that PSSM-RT is more effective than previous methods. This validates the point that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction. An ensemble learning classifier (EL_PSSM-RT) is also proposed by combining ensemble learning model and PSSM-RT to better handle the imbalance between binding and non-binding residues in datasets. EL_PSSM-RT is evaluated by five-fold cross-validation using PDNA-62 and PDNA-224 as well as two independent datasets TS-72 and TS-61. Performance comparisons with existing predictors on the four datasets demonstrate that EL_PSSM-RT is the best-performing method among all the predicting methods with improvement between 0.02-0.07 for MCC, 4.18-21.47% for ST and 0.013-0.131 for AUC. Furthermore, we analyze the importance of the pair-relationships extracted by PSSM-RT and the results validates the usefulness of PSSM-RT for encoding DNA-binding residues.

CONCLUSIONS

We propose a novel prediction method for the prediction of DNA-binding residue with the inclusion of relationship of evolutionary information and ensemble learning. Performance evaluation shows that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction and ensemble learning can be used to address the data imbalance issue between binding and non-binding residues. A web service of EL_PSSM-RT ( http://hlt.hitsz.edu.cn:8080/PSSM-RT_SVM/ ) is provided for free access to the biological research community.

摘要

背景

预测DNA结合残基对于理解蛋白质-DNA识别机制很重要。已经提出了许多用于预测的计算方法，但其中大多数没有考虑残基之间进化信息的关系。

结果

在本文中，我们首先提出了一种新颖的残基编码方法，称为位置特异性得分矩阵（PSSM）关系变换（PSSM-RT），通过利用残基之间进化信息的关系来编码残基。使用PDNA-62和PDNA-224通过五折交叉验证来评估PSSM-RT和两种现有的PSSM编码方法。性能评估表明，PSSM-RT比以前的方法更有效。这证实了残基之间进化信息的关系在DNA结合残基预测中确实有用这一观点。还通过结合集成学习模型和PSSM-RT提出了一种集成学习分类器（EL_PSSM-RT），以更好地处理数据集中结合和非结合残基之间的不平衡。使用PDNA-62和PDNA-224以及两个独立数据集TS-72和TS-61通过五折交叉验证对EL_PSSM-RT进行评估。在这四个数据集上与现有预测器的性能比较表明，EL_PSSM-RT是所有预测方法中性能最佳的方法，马修斯相关系数（MCC）提高了0.02 - 0.07，敏感度（ST）提高了4.18 - 21.47%，曲线下面积（AUC）提高了0.013 - 0.131。此外，我们分析了PSSM-RT提取的配对关系的重要性，结果证实了PSSM-RT对编码DNA结合残基的有用性。

结论

我们提出了一种新颖的预测方法，用于预测包含进化信息关系和集成学习的DNA结合残基。性能评估表明，残基之间的进化信息关系在DNA结合残基预测中确实有用，并且集成学习可用于解决结合和非结合残基之间的数据不平衡问题。提供了EL_PSSM-RT的网络服务（http://hlt.hitsz.edu.cn:8080/PSSM-RT_SVM/），供生物研究界免费使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2d69/5576297/3be76e69464a/12859_2017_1792_Fig1_HTML.jpg

相似文献

EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation.EL_PSSM-RT：通过整合集成学习与PSSM关系转换进行DNA结合残基预测

BMC Bioinformatics. 2017 Aug 29;18(1):379. doi: 10.1186/s12859-017-1792-8.

EL_LSTM: Prediction of DNA-Binding Residue from Protein Sequence by Combining Long Short-Term Memory and Ensemble Learning.EL_LSTM：通过组合长短期记忆和集成学习预测蛋白质序列中的 DNA 结合残基。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jan-Feb;17(1):124-135. doi: 10.1109/TCBB.2018.2858806. Epub 2018 Jul 23.

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.通过结合支持向量机和位置特异性得分矩阵距离变换来识别DNA结合蛋白。

BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1752-0509-9-S1-S10. Epub 2015 Feb 6.

Protein-RNA interface residue prediction using machine learning: an assessment of the state of the art.基于机器学习的蛋白质-RNA 界面残基预测：现状评估。

BMC Bioinformatics. 2012 May 10;13:89. doi: 10.1186/1471-2105-13-89.

DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.DP-BINDER：一种通过融合进化和物理化学信息来预测 DNA 结合蛋白的机器学习模型。

J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23.

Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method.使用混合支持向量机-位置特异性打分矩阵（SVM-PSSM）方法设计蛋白质中DNA结合位点的精确预测器。

Biosystems. 2007 Jul-Aug;90(1):234-41. doi: 10.1016/j.biosystems.2006.08.007. Epub 2006 Aug 23.

SVM based prediction of RNA-binding proteins using binding residues and evolutionary information.基于支持向量机的 RNA 结合蛋白结合残基和进化信息预测。

J Mol Recognit. 2011 Mar-Apr;24(2):303-13. doi: 10.1002/jmr.1061.

PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context.PDNAsite：通过整合空间和序列上下文从蛋白质序列中识别 DNA 结合位点。

Sci Rep. 2016 Jun 10;6:27653. doi: 10.1038/srep27653.

Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information.利用进化信息预测维生素结合蛋白中的维生素相互作用残基。

BMC Bioinformatics. 2013 Feb 7;14:44. doi: 10.1186/1471-2105-14-44.

Ensemble Learning Prediction of Drug-Target Interactions Using GIST Descriptor Extracted from PSSM-Based Evolutionary Information.基于 PSSM 进化信息提取的 GIST 描述符的药物-靶标相互作用的集成学习预测。

Biomed Res Int. 2020 Aug 21;2020:4516250. doi: 10.1155/2020/4516250. eCollection 2020.

引用本文的文献

TransBind allows precise detection of DNA-binding proteins and residues using language models and deep learning.TransBind可利用语言模型和深度学习精确检测DNA结合蛋白和残基。

Commun Biol. 2025 Apr 5;8(1):568. doi: 10.1038/s42003-025-07534-w.

Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences.蛋白质序列中核酸结合残基预测二十年进展

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbaf016.

DRBpred: A sequence-based machine learning method to effectively predict DNA- and RNA-binding residues.DRBpred：一种基于序列的机器学习方法，可有效预测 DNA 和 RNA 结合残基。

Comput Biol Med. 2024 Mar;170:108081. doi: 10.1016/j.compbiomed.2024.108081. Epub 2024 Jan 29.

BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models.BioSeq-BLM：一个基于生物语言模型分析 DNA、RNA 和蛋白质序列的平台。

Nucleic Acids Res. 2021 Dec 16;49(22):e129. doi: 10.1093/nar/gkab829.

BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information.BERT-m7G：一种基于 BERT 和堆叠集成的转换器架构，用于从序列信息中识别 RNA N7-甲基鸟苷位点。

Comput Math Methods Med. 2021 Aug 25;2021:7764764. doi: 10.1155/2021/7764764. eCollection 2021.

Prediction of Protein-ATP Binding Residues Based on Ensemble of Deep Convolutional Neural Networks and LightGBM Algorithm.基于深度卷积神经网络集成和 LightGBM 算法的蛋白质-ATP 结合残基预测。

Int J Mol Sci. 2021 Jan 19;22(2):939. doi: 10.3390/ijms22020939.

EnACP: An Ensemble Learning Model for Identification of Anticancer Peptides.EnACP：一种用于鉴定抗癌肽的集成学习模型。

Front Genet. 2020 Jul 30;11:760. doi: 10.3389/fgene.2020.00760. eCollection 2020.

Cross-Cell-Type Prediction of TF-Binding Site by Integrating Convolutional Neural Network and Adversarial Network.基于卷积神经网络和对抗网络的跨细胞类型预测 TF 结合位点

Int J Mol Sci. 2019 Jul 12;20(14):3425. doi: 10.3390/ijms20143425.

HRGPred: Prediction of herbicide resistant genes with k-mer nucleotide compositional features and support vector machine.HRGPred：基于 k--mer 核苷酸组成特征和支持向量机预测除草剂抗性基因。

Sci Rep. 2019 Jan 28;9(1):778. doi: 10.1038/s41598-018-37309-9.

PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine.PDRLGB：使用轻量级梯度提升机进行精确的 DNA 结合残基预测。

BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):522. doi: 10.1186/s12859-018-2527-1.

本文引用的文献

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.通过结合支持向量机和位置特异性得分矩阵距离变换来识别DNA结合蛋白。

BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1752-0509-9-S1-S10. Epub 2015 Feb 6.

Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach.通过 top-n-gram 方法将进化信息纳入伪氨基酸组成，从而鉴定 DNA 结合蛋白。

J Biomol Struct Dyn. 2015;33(8):1720-30. doi: 10.1080/07391102.2014.968624. Epub 2014 Oct 28.

iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.iDNA-Prot|dis：通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。

PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.

enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning.enDNA-Prot：通过应用集成学习识别DNA结合蛋白。

Biomed Res Int. 2014;2014:294279. doi: 10.1155/2014/294279. Epub 2014 May 26.

Using distances between Top-n-gram and residue pairs for protein remote homology detection.使用 Top-n-gram 与残基对之间的距离进行蛋白质远程同源检测。

BMC Bioinformatics. 2014;15 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2105-15-S2-S3. Epub 2014 Jan 24.

Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection.结合频率谱中提取的进化信息与基于序列的核函数进行蛋白质远程同源检测。

Bioinformatics. 2014 Feb 15;30(4):472-9. doi: 10.1093/bioinformatics/btt709. Epub 2013 Dec 5.

PreDNA: accurate prediction of DNA-binding sites in proteins by integrating sequence and geometric structure information.PreDNA：通过整合序列和几何结构信息来准确预测蛋白质中的 DNA 结合位点。

Bioinformatics. 2013 Mar 15;29(6):678-85. doi: 10.1093/bioinformatics/btt029. Epub 2013 Jan 17.

Using amino acid physicochemical distance transformation for fast protein remote homology detection.利用氨基酸物化距离变换进行快速蛋白质远程同源检测。

PLoS One. 2012;7(9):e46633. doi: 10.1371/journal.pone.0046633. Epub 2012 Sep 28.

Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.基于序列的具有保守性和相关性信息的蛋白质 DNA 结合残基预测。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1766-75. doi: 10.1109/TCBB.2012.106.

DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry.DR_bind：一个基于静电、进化和几何的从蛋白质结构预测 DNA 结合残基的网络服务器。

Nucleic Acids Res. 2012 Jul;40(Web Server issue):W249-56. doi: 10.1093/nar/gks481. Epub 2012 May 31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

EL_PSSM-RT：通过整合集成学习与PSSM关系转换进行DNA结合残基预测

EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献