• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

FTWSVM-SR:基于自表示的模糊孪生支持向量机进行 DNA 结合蛋白识别。

FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation.

机构信息

School of Internet of Things Engineering, Jiangnan University, Wuxi, 214122, People's Republic of China.

Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, People's Republic of China.

出版信息

Interdiscip Sci. 2022 Jun;14(2):372-384. doi: 10.1007/s12539-021-00489-6. Epub 2021 Nov 6.

DOI:10.1007/s12539-021-00489-6
PMID:34743286
Abstract

Due to the high cost of DNA-binding proteins (DBPs) detection, many machine learning algorithms (ML) have been utilized to large-scale process and detect DBPs. The previous methods took no count of the processing of noise samples. In this study, a fuzzy twin support vector machine (FTWSVM) is employed to detect DBPs. First, multiple types of protein sequence features are formed into kernel matrices; Then, multiple kernel learning (MKL) algorithm is utilized to linear combine multiple kernels; next, self-representation-based membership function is utilized to estimate membership value (weight) of each training sample; finally, we feed the integrated kernel matrix and membership values into the FTWSVM-SR model for training and testing. On comparison with other predictive models, FTWSVM based on SR (FTWSVM-SR) obtains the best performance of Matthew's correlation coefficient (MCC): 0.7410 and 0.5909 on two independent testing sets (PDB186 and PDB2272 datasets), respectively. The results confirm that our method can be an effective DBPs detection tool. Before the biochemical experiment, our model can screen and analyze DBPs on a large scale.

摘要

由于 DNA 结合蛋白(DBP)检测成本高,许多机器学习算法(ML)已被用于大规模处理和检测 DBP。之前的方法没有考虑噪声样本的处理。在这项研究中,采用模糊孪生支持向量机(FTWSVM)来检测 DBP。首先,将多种类型的蛋白质序列特征组合成核矩阵;然后,利用多核学习(MKL)算法对多个核进行线性组合;接下来,利用基于自表示的隶属函数来估计每个训练样本的隶属值(权重);最后,将集成核矩阵和隶属值输入到 FTWSVM-SR 模型中进行训练和测试。与其他预测模型相比,基于 SR 的 FTWSVM(FTWSVM-SR)在两个独立测试集(PDB186 和 PDB2272 数据集)上分别获得了最佳的马修斯相关系数(MCC):0.7410 和 0.5909。结果证实,我们的方法可以成为一种有效的 DBP 检测工具。在进行生化实验之前,我们的模型可以大规模筛选和分析 DBP。

相似文献

1
FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation.FTWSVM-SR:基于自表示的模糊孪生支持向量机进行 DNA 结合蛋白识别。
Interdiscip Sci. 2022 Jun;14(2):372-384. doi: 10.1007/s12539-021-00489-6. Epub 2021 Nov 6.
2
HKAM-MKM: A hybrid kernel alignment maximization-based multiple kernel model for identifying DNA-binding proteins.HKAM-MKM:一种基于混合核对齐最大化的多核模型,用于识别 DNA 结合蛋白。
Comput Biol Med. 2022 Jun;145:105395. doi: 10.1016/j.compbiomed.2022.105395. Epub 2022 Mar 17.
3
A sequence-based multiple kernel model for identifying DNA-binding proteins.基于序列的多重核模型用于识别 DNA 结合蛋白。
BMC Bioinformatics. 2021 May 31;22(Suppl 3):291. doi: 10.1186/s12859-020-03875-x.
4
FKRR-MVSF: A Fuzzy Kernel Ridge Regression Model for Identifying DNA-Binding Proteins by Multi-View Sequence Features via Chou's Five-Step Rule.FKRR-MVSF:一种基于模糊核岭回归模型的多视图序列特征方法,通过周的五步法则识别 DNA 结合蛋白。
Int J Mol Sci. 2019 Aug 26;20(17):4175. doi: 10.3390/ijms20174175.
5
Identification of DNA-binding proteins by Kernel Sparse Representation via L-matrix norm.基于 L 矩阵范数的核稀疏表示鉴定 DNA 结合蛋白
Comput Biol Med. 2023 Jun;159:106849. doi: 10.1016/j.compbiomed.2023.106849. Epub 2023 Apr 11.
6
DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.DP-BINDER:一种通过融合进化和物理化学信息来预测 DNA 结合蛋白的机器学习模型。
J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23.
7
Identification of DNA-binding protein based multiple kernel model.基于 DNA 结合蛋白的多核模型识别。
Math Biosci Eng. 2023 Jun 6;20(7):13149-13170. doi: 10.3934/mbe.2023586.
8
MV-H-RKM: A Multiple View-Based Hypergraph Regularized Restricted Kernel Machine for Predicting DNA-Binding Proteins.MV-H-RKM:一种基于多视图的超图正则化受限核机器,用于预测DNA结合蛋白。
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1246-1256. doi: 10.1109/TCBB.2022.3183191. Epub 2023 Apr 3.
9
Protein Crystallization Identification via Fuzzy Model on Linear Neighborhood Representation.基于线性邻域表示的模糊模型的蛋白质结晶鉴定。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1986-1995. doi: 10.1109/TCBB.2019.2954826. Epub 2021 Oct 7.
10
Inverse free reduced universum twin support vector machine for imbalanced data classification.用于不平衡数据分类的逆自由约简全域孪生支持向量机
Neural Netw. 2023 Jan;157:125-135. doi: 10.1016/j.neunet.2022.10.003. Epub 2022 Oct 15.

引用本文的文献

1
TransBind allows precise detection of DNA-binding proteins and residues using language models and deep learning.TransBind可利用语言模型和深度学习精确检测DNA结合蛋白和残基。
Commun Biol. 2025 Apr 5;8(1):568. doi: 10.1038/s42003-025-07534-w.
2
Computational prediction of promotors in strain C58 by using the machine learning technique.利用机器学习技术对C58菌株中的启动子进行计算预测。
Front Microbiol. 2023 Apr 13;14:1170785. doi: 10.3389/fmicb.2023.1170785. eCollection 2023.
3
Research on DNA-Binding Protein Identification Method Based on LSTM-CNN Feature Fusion.

本文引用的文献

1
UMAP-DBP: An Improved DNA-Binding Proteins Prediction Method Based on Uniform Manifold Approximation and Projection.UMAP-DBP:一种基于一致流形逼近和投影的改进 DNA 结合蛋白预测方法。
Protein J. 2021 Aug;40(4):562-575. doi: 10.1007/s10930-021-10011-y. Epub 2021 Jun 27.
2
Identifying DNA-binding proteins based on multi-features and LASSO feature selection.基于多特征和 LASSO 特征选择鉴定 DNA 结合蛋白。
Biopolymers. 2021 Feb;112(2):e23419. doi: 10.1002/bip.23419. Epub 2021 Jan 21.
3
MsDBP: Exploring DNA-Binding Proteins by Integrating Multiscale Sequence Information via Chou's Five-Step Rule.
基于 LSTM-CNN 特征融合的 DNA 结合蛋白识别方法研究。
Comput Math Methods Med. 2022 Jun 2;2022:9705275. doi: 10.1155/2022/9705275. eCollection 2022.
4
DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features.DNAPred_Prot:利用基于组成和位置的特征识别DNA结合蛋白。
Appl Bionics Biomech. 2022 Apr 13;2022:5483115. doi: 10.1155/2022/5483115. eCollection 2022.
5
Identifying Membrane Protein Types Based on Lifelong Learning With Dynamically Scalable Networks.基于动态可扩展网络的终身学习识别膜蛋白类型
Front Genet. 2022 Mar 14;12:834488. doi: 10.3389/fgene.2021.834488. eCollection 2021.
MsDBP:通过整合多尺度序列信息和周的五步法则探索 DNA 结合蛋白
J Proteome Res. 2019 Aug 2;18(8):3119-3132. doi: 10.1021/acs.jproteome.9b00226. Epub 2019 Jul 17.
4
Effective DNA binding protein prediction by using key features via Chou's general PseAAC.利用周元的通用 PseAAC 算法通过关键特征预测有效 DNA 结合蛋白。
J Theor Biol. 2019 Jan 7;460:64-78. doi: 10.1016/j.jtbi.2018.10.027. Epub 2018 Oct 11.
5
Exploring sequence-based features for the improved prediction of DNA N4-methylcytosine sites in multiple species.探索基于序列的特征,以提高在多个物种中预测 DNA N4-甲基胞嘧啶位点的能力。
Bioinformatics. 2019 Apr 15;35(8):1326-1333. doi: 10.1093/bioinformatics/bty824.
6
DPP-PseAAC: A DNA-binding protein prediction model using Chou's general PseAAC.DPP-PseAAC:一种基于 Chou 的通用 PseAAC 的 DNA 结合蛋白预测模型。
J Theor Biol. 2018 Sep 7;452:22-34. doi: 10.1016/j.jtbi.2018.05.006. Epub 2018 May 16.
7
DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation.通过结合伪氨基酸组成和基于轮廓的蛋白质表示来鉴定DNA结合蛋白
Sci Rep. 2015 Oct 20;5:15479. doi: 10.1038/srep15479.
8
iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition.iDNA-Prot|dis:通过将氨基酸距离对和简化字母表概况纳入通用伪氨基酸组成来鉴定DNA结合蛋白。
PLoS One. 2014 Sep 3;9(9):e106691. doi: 10.1371/journal.pone.0106691. eCollection 2014.
9
Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes.基于随机森林和高斯朴素贝叶斯混合特征选择的DNA结合蛋白序列预测
PLoS One. 2014 Jan 24;9(1):e86703. doi: 10.1371/journal.pone.0086703. eCollection 2014.
10
Wavelet images and Chou's pseudo amino acid composition for protein classification.小波图像和 Chou 的伪氨基酸组成用于蛋白质分类。
Amino Acids. 2012 Aug;43(2):657-65. doi: 10.1007/s00726-011-1114-9. Epub 2011 Oct 13.