• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于蛋白质序列的单链和双链DNA结合蛋白分析与预测

Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences.

作者信息

Wang Wei, Sun Lin, Zhang Shiguang, Zhang Hongjun, Shi Jinling, Xu Tianhe, Li Keliang

机构信息

College of Computer and Information Engineering, Henan Normal University, Xinxiang, Henan Province, 453007, China.

Laboratory of Computation Intelligence and Information Processing, Engineering Technology Research Center for Computing Intelligence and Data Mining, Xinxiang, Henan Province, 453007, China.

出版信息

BMC Bioinformatics. 2017 Jun 12;18(1):300. doi: 10.1186/s12859-017-1715-8.

DOI:10.1186/s12859-017-1715-8
PMID:28606086
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5469069/
Abstract

BACKGROUND

DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs.

RESULTS

Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing.

CONCLUSIONS

Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.

摘要

背景

DNA结合蛋白在大量生物活动中发挥着重要作用。DNA结合蛋白可与单链DNA(ssDNA)或双链DNA(dsDNA)相互作用,且DNA结合蛋白可分为单链DNA结合蛋白(SSB)和双链DNA结合蛋白(DSB)。从氨基酸序列中识别DNA结合蛋白有助于注释蛋白质功能并理解结合特异性。在本研究中,我们系统地考虑了多种表示蛋白质序列的方案:整体氨基酸组成(OAAC)特征、二肽组成、位置特异性评分矩阵谱(PSSM)和分割氨基酸组成(SAA),然后采用支持向量机(SVM)和随机森林(RF)分类模型来区分SSB和DSB。

结果

我们的结果表明,一些序列特征能够显著区分DSB和SSB。在基准数据集上通过10折交叉验证进行评估,我们的预测方法可达到88.7%的准确率和0.919的曲线下面积(AUC)。此外,我们的方法在独立测试中表现良好。

结论

利用各种源自序列的特征,提出了一种准确区分DSB和SSB的新方法。该方法还探索了新的特征,这可能有助于发现DNA结合蛋白的结合特异性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/02e546ebf59a/12859_2017_1715_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/1766111b2ef9/12859_2017_1715_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/8e8857165f36/12859_2017_1715_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/723897aa15fe/12859_2017_1715_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/a93687c4d69d/12859_2017_1715_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/97cbb9ed3bd4/12859_2017_1715_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/f51607234ae5/12859_2017_1715_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/2a0fc5b2e881/12859_2017_1715_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/02e546ebf59a/12859_2017_1715_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/1766111b2ef9/12859_2017_1715_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/8e8857165f36/12859_2017_1715_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/723897aa15fe/12859_2017_1715_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/a93687c4d69d/12859_2017_1715_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/97cbb9ed3bd4/12859_2017_1715_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/f51607234ae5/12859_2017_1715_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/2a0fc5b2e881/12859_2017_1715_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3996/5469069/02e546ebf59a/12859_2017_1715_Fig8_HTML.jpg

相似文献

1
Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences.基于蛋白质序列的单链和双链DNA结合蛋白分析与预测
BMC Bioinformatics. 2017 Jun 12;18(1):300. doi: 10.1186/s12859-017-1715-8.
2
Single-stranded and double-stranded DNA-binding protein prediction using HMM profiles.使用 HMM 轮廓预测单链和双链 DNA 结合蛋白。
Anal Biochem. 2021 Jan 1;612:113954. doi: 10.1016/j.ab.2020.113954. Epub 2020 Sep 15.
3
PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction.PredPSD:一种用于单链和双链 DNA 结合蛋白预测的梯度提升树方法。
Molecules. 2019 Dec 26;25(1):98. doi: 10.3390/molecules25010098.
4
SDBP-Pred: Prediction of single-stranded and double-stranded DNA-binding proteins by extending consensus sequence and K-segmentation strategies into PSSM.SDBP-Pred:通过将共识序列和 K 分割策略扩展到 PSSM 中,预测单链和双链 DNA 结合蛋白。
Anal Biochem. 2020 Jan 15;589:113494. doi: 10.1016/j.ab.2019.113494. Epub 2019 Nov 3.
5
Identification of single-stranded and double-stranded DNA binding proteins based on protein structure.基于蛋白质结构鉴定单链和双链 DNA 结合蛋白。
BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S4. doi: 10.1186/1471-2105-15-S12-S4. Epub 2014 Nov 6.
6
Analysis and classification of DNA-binding sites in single-stranded and double-stranded DNA-binding proteins using protein information.利用蛋白质信息对单链和双链DNA结合蛋白中的DNA结合位点进行分析和分类。
IET Syst Biol. 2014 Aug;8(4):176-83. doi: 10.1049/iet-syb.2013.0048.
7
Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation.通过结合支持向量机和位置特异性得分矩阵距离变换来识别DNA结合蛋白。
BMC Syst Biol. 2015;9 Suppl 1(Suppl 1):S10. doi: 10.1186/1752-0509-9-S1-S10. Epub 2015 Feb 6.
8
Surface shapes and surrounding environment analysis of single- and double-stranded DNA-binding proteins in protein-DNA interface.蛋白质-DNA界面中单链和双链DNA结合蛋白的表面形状及周围环境分析
Proteins. 2016 Jul;84(7):979-89. doi: 10.1002/prot.25045. Epub 2016 Apr 16.
9
Functions of single-strand DNA-binding proteins in DNA replication, recombination, and repair.单链DNA结合蛋白在DNA复制、重组和修复中的功能。
Methods Mol Biol. 2012;922:1-21. doi: 10.1007/978-1-62703-032-8_1.
10
Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches.单链 DNA 结合蛋白及其基于机器学习的鉴定方法。
Biomolecules. 2022 Aug 26;12(9):1187. doi: 10.3390/biom12091187.

引用本文的文献

1
Accurate prediction of nucleic acid binding proteins using protein language model.使用蛋白质语言模型准确预测核酸结合蛋白。
Bioinform Adv. 2025 Jan 20;5(1):vbaf008. doi: 10.1093/bioadv/vbaf008. eCollection 2025.
2
Improved prediction of DNA and RNA binding proteins with deep learning models.深度学习模型提高 DNA 和 RNA 结合蛋白的预测能力。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae285.
3
Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches.单链 DNA 结合蛋白及其基于机器学习的鉴定方法。

本文引用的文献

1
Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods.通过结合序列衍生特征和多标签学习方法预测人类剪接分支点
BMC Bioinformatics. 2017 Dec 1;18(Suppl 13):464. doi: 10.1186/s12859-017-1875-6.
2
Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data.通过整合化学、生物学、表型和网络数据预测潜在的药物相互作用。
BMC Bioinformatics. 2017 Jan 5;18(1):18. doi: 10.1186/s12859-016-1415-9.
3
A genetic algorithm-based weighted ensemble method for predicting transposon-derived piRNAs.
Biomolecules. 2022 Aug 26;12(9):1187. doi: 10.3390/biom12091187.
4
Prediction of DNA-Binding Protein-Drug-Binding Sites Using Residue Interaction Networks and Sequence Feature.利用残基相互作用网络和序列特征预测DNA结合蛋白-药物结合位点
Front Bioeng Biotechnol. 2022 Apr 20;10:822392. doi: 10.3389/fbioe.2022.822392. eCollection 2022.
5
PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction.PredPSD:一种用于单链和双链 DNA 结合蛋白预测的梯度提升树方法。
Molecules. 2019 Dec 26;25(1):98. doi: 10.3390/molecules25010098.
6
On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach.仅基于一级序列预测DNA结合蛋白:一种深度学习方法。
PLoS One. 2017 Dec 29;12(12):e0188129. doi: 10.1371/journal.pone.0188129. eCollection 2017.
一种基于遗传算法的加权集成方法用于预测转座子衍生的piRNA。
BMC Bioinformatics. 2016 Aug 31;17(1):329. doi: 10.1186/s12859-016-1206-3.
4
Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features.通过整合各种序列和物理化学特征准确预测转座子衍生的piRNA
PLoS One. 2016 Apr 13;11(4):e0153268. doi: 10.1371/journal.pone.0153268. eCollection 2016.
5
Predicting drug side effects by multi-label learning and ensemble learning.通过多标签学习和集成学习预测药物副作用。
BMC Bioinformatics. 2015 Nov 4;16:365. doi: 10.1186/s12859-015-0774-y.
6
Accurate prediction of immunogenic T-cell epitopes from epitope sequences using the genetic algorithm-based ensemble learning.使用基于遗传算法的集成学习从表位序列中准确预测免疫原性T细胞表位。
PLoS One. 2015 May 28;10(5):e0128194. doi: 10.1371/journal.pone.0128194. eCollection 2015.
7
Identification of single-stranded and double-stranded DNA binding proteins based on protein structure.基于蛋白质结构鉴定单链和双链 DNA 结合蛋白。
BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S4. doi: 10.1186/1471-2105-15-S12-S4. Epub 2014 Nov 6.
8
Analysis and classification of DNA-binding sites in single-stranded and double-stranded DNA-binding proteins using protein information.利用蛋白质信息对单链和双链DNA结合蛋白中的DNA结合位点进行分析和分类。
IET Syst Biol. 2014 Aug;8(4):176-83. doi: 10.1049/iet-syb.2013.0048.
9
DBSI: DNA-binding site identifier.DBSI:DNA 结合位点标识符。
Nucleic Acids Res. 2013 Sep;41(16):e160. doi: 10.1093/nar/gkt617. Epub 2013 Jul 19.
10
Single-stranded DNA-binding proteins: multiple domains for multiple functions.单链 DNA 结合蛋白:多功能多结构域。
Structure. 2013 Jul 2;21(7):1074-84. doi: 10.1016/j.str.2013.05.013.