• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过基于位置特异性得分矩阵(PSSM)信息的压缩技术改进DNA结合蛋白的检测。

Improved detection of DNA-binding proteins via compression technology on PSSM information.

作者信息

Wang Yubo, Ding Yijie, Guo Fei, Wei Leyi, Tang Jijun

机构信息

School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.

Tianjin University Institute of Computational Biology, Tianjin 300350, China.

出版信息

PLoS One. 2017 Sep 29;12(9):e0185587. doi: 10.1371/journal.pone.0185587. eCollection 2017.

DOI:10.1371/journal.pone.0185587
PMID:28961273
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5621689/
Abstract

Since the importance of DNA-binding proteins in multiple biomolecular functions has been recognized, an increasing number of researchers are attempting to identify DNA-binding proteins. In recent years, the machine learning methods have become more and more compelling in the case of protein sequence data soaring, because of their favorable speed and accuracy. In this paper, we extract three features from the protein sequence, namely NMBAC (Normalized Moreau-Broto Autocorrelation), PSSM-DWT (Position-specific scoring matrix-Discrete Wavelet Transform), and PSSM-DCT (Position-specific scoring matrix-Discrete Cosine Transform). We also employ feature selection algorithm on these feature vectors. Then, these features are fed into the training SVM (support vector machine) model as classifier to predict DNA-binding proteins. Our method applys three datasets, namely PDB1075, PDB594 and PDB186, to evaluate the performance of our approach. The PDB1075 and PDB594 datasets are employed for Jackknife test and the PDB186 dataset is used for the independent test. Our method achieves the best accuracy in the Jacknife test, from 79.20% to 86.23% and 80.5% to 86.20% on PDB1075 and PDB594 datasets, respectively. In the independent test, the accuracy of our method comes to 76.3%. The performance of independent test also shows that our method has a certain ability to be effectively used for DNA-binding protein prediction. The data and source code are at https://doi.org/10.6084/m9.figshare.5104084.

摘要

由于DNA结合蛋白在多种生物分子功能中的重要性已得到认可,越来越多的研究人员试图鉴定DNA结合蛋白。近年来,随着蛋白质序列数据的激增,机器学习方法因其良好的速度和准确性而变得越来越有吸引力。在本文中,我们从蛋白质序列中提取了三个特征,即归一化莫罗-布罗托自相关(NMBAC)、位置特异性评分矩阵-离散小波变换(PSSM-DWT)和位置特异性评分矩阵-离散余弦变换(PSSM-DCT)。我们还对这些特征向量采用了特征选择算法。然后,将这些特征输入到作为分类器的训练支持向量机(SVM)模型中,以预测DNA结合蛋白。我们的方法应用了三个数据集,即PDB1075、PDB594和PDB186,来评估我们方法的性能。PDB1075和PDB594数据集用于留一法检验,PDB186数据集用于独立检验。我们的方法在留一法检验中取得了最佳准确率,在PDB1075和PDB594数据集上分别为79.20%至86.23%和80.5%至86.20%。在独立检验中,我们方法的准确率达到了76.3%。独立检验的性能也表明我们的方法具有一定的有效用于DNA结合蛋白预测的能力。数据和源代码位于https://doi.org/10.6084/m9.figshare.5104084 。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/3fd3a5e1dc9a/pone.0185587.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/7b7454093047/pone.0185587.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/e86daf22aa11/pone.0185587.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/ab5f2af7590f/pone.0185587.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/1f06cf5bb85e/pone.0185587.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/e04a21dfb7d1/pone.0185587.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/72e51fb0e33d/pone.0185587.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/f032ae95cc2f/pone.0185587.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/3fd3a5e1dc9a/pone.0185587.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/7b7454093047/pone.0185587.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/e86daf22aa11/pone.0185587.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/ab5f2af7590f/pone.0185587.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/1f06cf5bb85e/pone.0185587.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/e04a21dfb7d1/pone.0185587.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/72e51fb0e33d/pone.0185587.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/f032ae95cc2f/pone.0185587.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaf0/5621689/3fd3a5e1dc9a/pone.0185587.g008.jpg

相似文献

1
Improved detection of DNA-binding proteins via compression technology on PSSM information.通过基于位置特异性得分矩阵(PSSM)信息的压缩技术改进DNA结合蛋白的检测。
PLoS One. 2017 Sep 29;12(9):e0185587. doi: 10.1371/journal.pone.0185587. eCollection 2017.
2
Identifying DNA-binding proteins based on multi-features and LASSO feature selection.基于多特征和 LASSO 特征选择鉴定 DNA 结合蛋白。
Biopolymers. 2021 Feb;112(2):e23419. doi: 10.1002/bip.23419. Epub 2021 Jan 21.
3
DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.DP-BINDER:一种通过融合进化和物理化学信息来预测 DNA 结合蛋白的机器学习模型。
J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23.
4
Target-DBPPred: An intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting.目标-DBPPred:一种使用基于离散小波变换的压缩和轻极限梯度提升的智能 DNA 结合蛋白预测模型。
Comput Biol Med. 2022 Jun;145:105533. doi: 10.1016/j.compbiomed.2022.105533. Epub 2022 Apr 16.
5
CrystalM: A Multi-View Fusion Approach for Protein Crystallization Prediction.CrystalM:一种用于蛋白质结晶预测的多视图融合方法。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Jan-Feb;18(1):325-335. doi: 10.1109/TCBB.2019.2912173. Epub 2021 Feb 3.
6
Sequence-based Detection of DNA-binding Proteins using Multiple-view Features Allied with Feature Selection.基于序列的 DNA 结合蛋白的多视图特征联合特征选择检测。
Mol Inform. 2020 Aug;39(8):e2000006. doi: 10.1002/minf.202000006. Epub 2020 Mar 23.
7
Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.基于随机森林和高斯朴素贝叶斯混合特征选择的DNA结合蛋白序列预测
PLoS One. 2014 Jan 24;9(1):e86703. doi: 10.1371/journal.pone.0086703. eCollection 2014.
8
EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation.EL_PSSM-RT:通过整合集成学习与PSSM关系转换进行DNA结合残基预测
BMC Bioinformatics. 2017 Aug 29;18(1):379. doi: 10.1186/s12859-017-1792-8.
9
DBP-GAPred: An intelligent method for prediction of DNA-binding proteins types by enhanced evolutionary profile features with ensemble learning.DBP-GAPred:一种通过增强进化轮廓特征与集成学习预测 DNA 结合蛋白类型的智能方法。
J Bioinform Comput Biol. 2021 Aug;19(4):2150018. doi: 10.1142/S0219720021500189. Epub 2021 Jul 21.
10
A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers.一种通过协调多视图特征和分类器来识别DNA结合蛋白的模型堆叠框架。
Genes (Basel). 2018 Aug 1;9(8):394. doi: 10.3390/genes9080394.

引用本文的文献

1
Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features.深度WET:一种基于深度学习的方法,利用带加权特征的词嵌入技术预测DNA结合蛋白。
Sci Rep. 2024 Feb 5;14(1):2961. doi: 10.1038/s41598-024-52653-9.
2
GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier.GBDT_KgluSite:一种基于特征融合和 GBDT 分类器的赖氨酸谷氨酰化位点改进计算预测模型。
BMC Genomics. 2023 Dec 11;24(1):765. doi: 10.1186/s12864-023-09834-z.
3
A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins.

本文引用的文献

1
Analysis of Co-Associated Transcription Factors via Ordered Adjacency Differences on Motif Distribution.通过 motif 分布上的有序邻接差异分析共关联转录因子。
Sci Rep. 2017 Feb 27;7:43597. doi: 10.1038/srep43597.
2
DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.DNABP:基于随机森林特征选择识别DNA结合蛋白并预测结合残基
PLoS One. 2016 Dec 1;11(12):e0167345. doi: 10.1371/journal.pone.0167345. eCollection 2016.
3
PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.
一种基于物理化学性质提取方法的GHKNN模型,用于识别SNARE蛋白。
Front Genet. 2022 Nov 23;13:935717. doi: 10.3389/fgene.2022.935717. eCollection 2022.
4
A method for identifying moonlighting proteins based on linear discriminant analysis and bagging-SVM.一种基于线性判别分析和装袋支持向量机的兼职蛋白识别方法。
Front Genet. 2022 Aug 15;13:963349. doi: 10.3389/fgene.2022.963349. eCollection 2022.
5
Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins.基于比对和基于预训练特征表示的 DNA 结合蛋白鉴定的比较分析。
Comput Math Methods Med. 2022 Jun 28;2022:5847242. doi: 10.1155/2022/5847242. eCollection 2022.
6
Research on DNA-Binding Protein Identification Method Based on LSTM-CNN Feature Fusion.基于 LSTM-CNN 特征融合的 DNA 结合蛋白识别方法研究。
Comput Math Methods Med. 2022 Jun 2;2022:9705275. doi: 10.1155/2022/9705275. eCollection 2022.
7
Identifying Membrane Protein Types Based on Lifelong Learning With Dynamically Scalable Networks.基于动态可扩展网络的终身学习识别膜蛋白类型
Front Genet. 2022 Mar 14;12:834488. doi: 10.3389/fgene.2021.834488. eCollection 2021.
8
Application of DNA-Binding Protein Prediction Based on Graph Convolutional Network and Contact Map.基于图卷积网络和接触图的 DNA 结合蛋白预测的应用。
Biomed Res Int. 2022 Jan 17;2022:9044793. doi: 10.1155/2022/9044793. eCollection 2022.
9
VTP-Identifier: Vesicular Transport Proteins Identification Based on PSSM Profiles and XGBoost.VTP标识符:基于位置特异性打分矩阵(PSSM)轮廓和极端梯度提升(XGBoost)的囊泡运输蛋白识别
Front Genet. 2022 Jan 3;12:808856. doi: 10.3389/fgene.2021.808856. eCollection 2021.
10
A sequence-based multiple kernel model for identifying DNA-binding proteins.基于序列的多重核模型用于识别 DNA 结合蛋白。
BMC Bioinformatics. 2021 May 31;22(Suppl 3):291. doi: 10.1186/s12859-020-03875-x.
PseDNA-Pro:结合周氏伪氨基酸组成和物理化学距离变换的DNA结合蛋白鉴定方法
Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.
4
DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation.通过结合伪氨基酸组成和基于轮廓的蛋白质表示来鉴定DNA结合蛋白
Sci Rep. 2015 Oct 20;5:15479. doi: 10.1038/srep15479.
5
DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool.使用植物特异性支持向量机进行DNA结合蛋白预测:一种新的基因组注释工具的验证与应用
Nucleic Acids Res. 2015 Dec 15;43(22):e158. doi: 10.1093/nar/gkv805. Epub 2015 Aug 24.
6
Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences.Pse-in-One:一个用于生成DNA、RNA和蛋白质序列各种伪组件模式的网络服务器。
Nucleic Acids Res. 2015 Jul 1;43(W1):W65-71. doi: 10.1093/nar/gkv458. Epub 2015 May 9.
7
Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.通过结合支持向量机和位置特异性得分矩阵距离变换来识别DNA结合蛋白。
BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1752-0509-9-S1-S10. Epub 2015 Feb 6.
8
Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach.通过 top-n-gram 方法将进化信息纳入伪氨基酸组成,从而鉴定 DNA 结合蛋白。
J Biomol Struct Dyn. 2015;33(8):1720-30. doi: 10.1080/07391102.2014.968624. Epub 2014 Oct 28.
9
nDNA-Prot: identification of DNA-binding proteins based on unbalanced classification.nDNA-Prot:基于不平衡分类的 DNA 结合蛋白识别。
BMC Bioinformatics. 2014 Sep 8;15(1):298. doi: 10.1186/1471-2105-15-298.
10
iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.iDNA-Prot|dis:通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。
PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.