• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用系统方法预测和分析 DNA 结合域,以确定一组有意义的物理化学和生化特性。

Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties.

机构信息

Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan.

出版信息

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S47. doi: 10.1186/1471-2105-12-S1-S47.

DOI:10.1186/1471-2105-12-S1-S47
PMID:21342579
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3044304/
Abstract

BACKGROUND

Existing methods of predicting DNA-binding proteins used valuable features of physicochemical properties to design support vector machine (SVM) based classifiers. Generally, selection of physicochemical properties and determination of their corresponding feature vectors rely mainly on known properties of binding mechanism and experience of designers. However, there exists a troublesome problem for designers that some different physicochemical properties have similar vectors of representing 20 amino acids and some closely related physicochemical properties have dissimilar vectors.

RESULTS

This study proposes a systematic approach (named Auto-IDPCPs) to automatically identify a set of physicochemical and biochemical properties in the AAindex database to design SVM-based classifiers for predicting and analyzing DNA-binding domains/proteins. Auto-IDPCPs consists of 1) clustering 531 amino acid indices in AAindex into 20 clusters using a fuzzy c-means algorithm, 2) utilizing an efficient genetic algorithm based optimization method IBCGA to select an informative feature set of size m to represent sequences, and 3) analyzing the selected features to identify related physicochemical properties which may affect the binding mechanism of DNA-binding domains/proteins. The proposed Auto-IDPCPs identified m = 22 features of properties belonging to five clusters for predicting DNA-binding domains with a five-fold cross-validation accuracy of 87.12%, which is promising compared with the accuracy of 86.62% of the existing method PSSM-400. For predicting DNA-binding sequences, the accuracy of 75.50% was obtained using m = 28 features, where PSSM-400 has an accuracy of 74.22%. Auto-IDPCPs and PSSM-400 have accuracies of 80.73% and 82.81%, respectively, applied to an independent test data set of DNA-binding domains. Some typical physicochemical properties discovered are hydrophobicity, secondary structure, charge, solvent accessibility, polarity, flexibility, normalized Van Der Waals volume, pK (pK-C, pK-N, pK-COOH and pK-a(RCOOH)), etc.

CONCLUSIONS

The proposed approach Auto-IDPCPs would help designers to investigate informative physicochemical and biochemical properties by considering both prediction accuracy and analysis of binding mechanism simultaneously. The approach Auto-IDPCPs can be also applicable to predict and analyze other protein functions from sequences.

摘要

背景

现有的 DNA 结合蛋白预测方法利用理化性质的有价值特征来设计基于支持向量机 (SVM) 的分类器。通常,理化性质的选择及其特征向量的确定主要依赖于结合机制的已知性质和设计者的经验。然而,对于设计者来说,存在一个麻烦的问题,即一些不同的理化性质具有相似的代表 20 种氨基酸的向量,而一些密切相关的理化性质具有不同的向量。

结果

本研究提出了一种系统的方法 (命名为 Auto-IDPCPs),用于自动识别 AAindex 数据库中的一组理化和生化性质,以设计基于 SVM 的分类器来预测和分析 DNA 结合结构域/蛋白质。Auto-IDPCPs 包括 1)使用模糊 c-均值算法将 AAindex 中的 531 种氨基酸指数聚类成 20 个簇,2)利用高效的遗传算法基于优化方法 IBCGA 选择一个大小为 m 的信息量特征集来代表序列,3)分析所选特征以识别可能影响 DNA 结合结构域/蛋白质结合机制的相关理化性质。所提出的 Auto-IDPCPs 识别了属于五个簇的 m = 22 个属性特征,用于预测 DNA 结合结构域,五重交叉验证的准确率为 87.12%,与现有方法 PSSM-400 的准确率 86.62%相比,这是很有前途的。对于预测 DNA 结合序列,使用 m = 28 个特征得到了 75.50%的准确率,而 PSSM-400 的准确率为 74.22%。Auto-IDPCPs 和 PSSM-400 分别应用于独立的 DNA 结合结构域测试数据集,准确率为 80.73%和 82.81%。发现的一些典型理化性质包括疏水性、二级结构、电荷、溶剂可及性、极性、柔韧性、归一化范德华体积、pK(pK-C、pK-N、pK-COOH 和 pK-a(RCOOH)))等。

结论

所提出的方法 Auto-IDPCPs 将有助于设计者通过同时考虑预测准确性和结合机制分析来研究有信息的理化和生化性质。该方法 Auto-IDPCPs 也可应用于从序列预测和分析其他蛋白质功能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/e2316ff5161d/1471-2105-12-S1-S47-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/7597c9963d1a/1471-2105-12-S1-S47-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/e30bd617b148/1471-2105-12-S1-S47-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/31bd2a5dd44d/1471-2105-12-S1-S47-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/4350c675b436/1471-2105-12-S1-S47-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/7321f666aad5/1471-2105-12-S1-S47-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/9c7fc2cd52dd/1471-2105-12-S1-S47-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/c8d4a9be07d7/1471-2105-12-S1-S47-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/ac0a350aacb7/1471-2105-12-S1-S47-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/e2316ff5161d/1471-2105-12-S1-S47-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/7597c9963d1a/1471-2105-12-S1-S47-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/e30bd617b148/1471-2105-12-S1-S47-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/31bd2a5dd44d/1471-2105-12-S1-S47-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/4350c675b436/1471-2105-12-S1-S47-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/7321f666aad5/1471-2105-12-S1-S47-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/9c7fc2cd52dd/1471-2105-12-S1-S47-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/c8d4a9be07d7/1471-2105-12-S1-S47-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/ac0a350aacb7/1471-2105-12-S1-S47-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed8e/3044304/e2316ff5161d/1471-2105-12-S1-S47-9.jpg

相似文献

1
Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties.使用系统方法预测和分析 DNA 结合域,以确定一组有意义的物理化学和生化特性。
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S47. doi: 10.1186/1471-2105-12-S1-S47.
2
FRKAS: knowledge acquisition using a fuzzy rule base approach to insight of DNA-binding domains/proteins.FRKAS:使用模糊规则库方法获取有关DNA结合结构域/蛋白质的知识以实现深入了解。
Protein Pept Lett. 2013 Mar;20(3):299-308. doi: 10.2174/0929866511320030008.
3
Computational identification of ubiquitylation sites from protein sequences.从蛋白质序列中通过计算方法鉴定泛素化位点
BMC Bioinformatics. 2008 Jul 15;9:310. doi: 10.1186/1471-2105-9-310.
4
ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization.ProLoc-GO:利用信息丰富的基因本体术语进行基于序列的蛋白质亚细胞定位预测。
BMC Bioinformatics. 2008 Feb 1;9:80. doi: 10.1186/1471-2105-9-80.
5
Real value prediction of protein solvent accessibility using enhanced PSSM features.使用增强的位置特异性得分矩阵(PSSM)特征对蛋白质溶剂可及性进行实际值预测。
BMC Bioinformatics. 2008 Dec 12;9 Suppl 12(Suppl 12):S12. doi: 10.1186/1471-2105-9-S12-S12.
6
An ensemble of reduced alphabets with protein encoding based on grouped weight for predicting DNA-binding proteins.一种基于分组权重进行蛋白质编码的简化字母表集合,用于预测DNA结合蛋白。
Amino Acids. 2009 Feb;36(2):167-75. doi: 10.1007/s00726-008-0044-7. Epub 2008 Feb 21.
7
Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information.基于序列的具有保守性和相关性信息的蛋白质 DNA 结合残基预测。
IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1766-75. doi: 10.1109/TCBB.2012.106.
8
Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。
BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.
9
DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.DP-BINDER:一种通过融合进化和物理化学信息来预测 DNA 结合蛋白的机器学习模型。
J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23.
10
Prediction of mono- and di-nucleotide-specific DNA-binding sites in proteins using neural networks.使用神经网络预测蛋白质中单核和双核核苷酸特异性DNA结合位点。
BMC Struct Biol. 2009 May 13;9:30. doi: 10.1186/1472-6807-9-30.

引用本文的文献

1
Benchmarking recent computational tools for DNA-binding protein identification.对近期用于DNA结合蛋白识别的计算工具进行基准测试。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae634.
2
DRBpred: A sequence-based machine learning method to effectively predict DNA- and RNA-binding residues.DRBpred:一种基于序列的机器学习方法,可有效预测 DNA 和 RNA 结合残基。
Comput Biol Med. 2024 Mar;170:108081. doi: 10.1016/j.compbiomed.2024.108081. Epub 2024 Jan 29.
3
Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches.

本文引用的文献

1
A threading-based method for the prediction of DNA-binding proteins with application to the human genome.基于串联的方法预测 DNA 结合蛋白及其在人类基因组中的应用。
PLoS Comput Biol. 2009 Nov;5(11):e1000567. doi: 10.1371/journal.pcbi.1000567. Epub 2009 Nov 13.
2
Predicting DNA- and RNA-binding proteins from sequences with kernel methods.利用核方法从序列中预测DNA和RNA结合蛋白。
J Theor Biol. 2009 May 21;258(2):289-93. doi: 10.1016/j.jtbi.2009.01.024. Epub 2009 Feb 6.
3
Computational identification of ubiquitylation sites from protein sequences.
单链 DNA 结合蛋白及其基于机器学习的鉴定方法。
Biomolecules. 2022 Aug 26;12(9):1187. doi: 10.3390/biom12091187.
4
DNAPred_Prot: Identification of DNA-Binding Proteins Using Composition- and Position-Based Features.DNAPred_Prot:利用基于组成和位置的特征识别DNA结合蛋白。
Appl Bionics Biomech. 2022 Apr 13;2022:5483115. doi: 10.1155/2022/5483115. eCollection 2022.
5
GIpred: a computational tool for prediction of GIGANTEA proteins using machine learning algorithm.GIpred:一种使用机器学习算法预测巨蛋白的计算工具。
Physiol Mol Biol Plants. 2022 Jan;28(1):1-16. doi: 10.1007/s12298-022-01130-6. Epub 2022 Jan 24.
6
Use Chou's 5-Step Rule to Predict DNA-Binding Proteins with Evolutionary Information.利用 Chou 的 5 步法则结合进化信息预测 DNA 结合蛋白。
Biomed Res Int. 2020 Jul 27;2020:6984045. doi: 10.1155/2020/6984045. eCollection 2020.
7
PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction.PredPSD:一种用于单链和双链 DNA 结合蛋白预测的梯度提升树方法。
Molecules. 2019 Dec 26;25(1):98. doi: 10.3390/molecules25010098.
8
Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences.基于蛋白质序列的单链和双链DNA结合蛋白分析与预测
BMC Bioinformatics. 2017 Jun 12;18(1):300. doi: 10.1186/s12859-017-1715-8.
9
A hydrophobic spine stabilizes a surface-exposed α-helix according to analysis of the solvent-accessible surface area.根据溶剂可及表面积分析,疏水主链稳定了表面暴露的α螺旋。
BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):503. doi: 10.1186/s12859-016-1368-z.
10
SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides.SCMMTP:利用二肽倾向得分鉴定和表征膜转运蛋白
BMC Genomics. 2015;16 Suppl 12(Suppl 12):S6. doi: 10.1186/1471-2164-16-S12-S6. Epub 2015 Dec 9.
从蛋白质序列中通过计算方法鉴定泛素化位点
BMC Bioinformatics. 2008 Jul 15;9:310. doi: 10.1186/1471-2105-9-310.
4
Identification of DNA-binding proteins using support vector machines and evolutionary profiles.利用支持向量机和进化谱鉴定DNA结合蛋白。
BMC Bioinformatics. 2007 Nov 27;8:463. doi: 10.1186/1471-2105-8-463.
5
AAindex: amino acid index database, progress report 2008.AAindex:氨基酸索引数据库,2008年进展报告。
Nucleic Acids Res. 2008 Jan;36(Database issue):D202-5. doi: 10.1093/nar/gkm998. Epub 2007 Nov 12.
6
Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features.预测DNA结合蛋白:基于周的伪氨基酸组成及其他特定序列特征的方法
Amino Acids. 2008 Jan;34(1):103-9. doi: 10.1007/s00726-007-0568-2. Epub 2007 Jul 12.
7
POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties.POPI:通过挖掘信息丰富的物理化学性质预测MHC I类结合肽的免疫原性
Bioinformatics. 2007 Apr 15;23(8):942-9. doi: 10.1093/bioinformatics/btm061. Epub 2007 Mar 24.
8
Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method.使用混合支持向量机-位置特异性打分矩阵(SVM-PSSM)方法设计蛋白质中DNA结合位点的精确预测器。
Biosystems. 2007 Jul-Aug;90(1):234-41. doi: 10.1016/j.biosystems.2006.08.007. Epub 2006 Aug 23.
9
Predicting rRNA-, RNA-, and DNA-binding proteins from primary structure with support vector machines.利用支持向量机从一级结构预测核糖体RNA、RNA和DNA结合蛋白。
J Theor Biol. 2006 May 21;240(2):175-84. doi: 10.1016/j.jtbi.2005.09.018. Epub 2005 Nov 7.
10
Inheritable genetic algorithm for biobjective 0/1 combinatorial optimization problems and its applications.用于双目标0/1组合优化问题的可遗传遗传算法及其应用
IEEE Trans Syst Man Cybern B Cybern. 2004 Feb;34(1):609-20. doi: 10.1109/tsmcb.2003.817090.