• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多宝:通过整合进化信号和机器学习进行蛋白质结构域边界预测。

DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning.

机构信息

Department of Computer Science, University of Missouri, Columbia, MO 65211, USA.

出版信息

BMC Bioinformatics. 2011 Feb 1;12:43. doi: 10.1186/1471-2105-12-43.

DOI:10.1186/1471-2105-12-43
PMID:21284866
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3036623/
Abstract

BACKGROUND

Accurate identification of protein domain boundaries is useful for protein structure determination and prediction. However, predicting protein domain boundaries from a sequence is still very challenging and largely unsolved.

RESULTS

We developed a new method to integrate the classification power of machine learning with evolutionary signals embedded in protein families in order to improve protein domain boundary prediction. The method first extracts putative domain boundary signals from a multiple sequence alignment between a query sequence and its homologs. The putative sites are then classified and scored by support vector machines in conjunction with input features such as sequence profiles, secondary structures, solvent accessibilities around the sites and their positions. The method was evaluated on a domain benchmark by 10-fold cross-validation and 60% of true domain boundaries can be recalled at a precision of 60%. The trade-off between the precision and recall can be adjusted according to specific needs by using different decision thresholds on the domain boundary scores assigned by the support vector machines.

CONCLUSIONS

The good prediction accuracy and the flexibility of selecting domain boundary sites at different precision and recall values make our method a useful tool for protein structure determination and modelling. The method is available at http://sysbio.rnet.missouri.edu/dobo/.

摘要

背景

准确识别蛋白质结构域边界对于蛋白质结构确定和预测非常有用。然而,从序列预测蛋白质结构域边界仍然极具挑战性,并且在很大程度上尚未解决。

结果

我们开发了一种新方法,将机器学习的分类能力与蛋白质家族中嵌入的进化信号相结合,以提高蛋白质结构域边界预测的准确性。该方法首先从查询序列与其同源序列之间的多重序列比对中提取潜在的结构域边界信号。然后,通过支持向量机与输入特征(如序列特征、二级结构、局部溶剂可及性和位置)结合,对潜在的结构域边界进行分类和评分。该方法在结构域基准上进行了 10 折交叉验证评估,在精度为 60%的情况下可以召回 60%的真实结构域边界。通过在支持向量机分配的结构域边界分数上使用不同的决策阈值,可以根据特定需求调整精度和召回率之间的权衡。

结论

良好的预测准确性和在不同精度和召回率下选择结构域边界位置的灵活性使我们的方法成为蛋白质结构确定和建模的有用工具。该方法可在 http://sysbio.rnet.missouri.edu/dobo/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5258/3036623/c8b9d30a20c0/1471-2105-12-43-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5258/3036623/e14da0cd712e/1471-2105-12-43-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5258/3036623/22de126aa290/1471-2105-12-43-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5258/3036623/ae5fdc275469/1471-2105-12-43-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5258/3036623/c8b9d30a20c0/1471-2105-12-43-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5258/3036623/e14da0cd712e/1471-2105-12-43-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5258/3036623/22de126aa290/1471-2105-12-43-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5258/3036623/ae5fdc275469/1471-2105-12-43-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5258/3036623/c8b9d30a20c0/1471-2105-12-43-4.jpg

相似文献

1
DoBo: Protein domain boundary prediction by integrating evolutionary signals and machine learning.多宝:通过整合进化信号和机器学习进行蛋白质结构域边界预测。
BMC Bioinformatics. 2011 Feb 1;12:43. doi: 10.1186/1471-2105-12-43.
2
SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines.SMOQ:一种使用支持向量机预测单个蛋白质模型绝对残基特异性质量的工具。
BMC Bioinformatics. 2014 Apr 28;15:120. doi: 10.1186/1471-2105-15-120.
3
Improved general regression network for protein domain boundary prediction.用于蛋白质结构域边界预测的改进型通用回归网络。
BMC Bioinformatics. 2008;9 Suppl 1(Suppl 1):S12. doi: 10.1186/1471-2105-9-S1-S12.
4
DomHR: accurately identifying domain boundaries in proteins using a hinge region strategy.使用铰链区策略准确识别蛋白质的结构域边界。
PLoS One. 2013 Apr 11;8(4):e60559. doi: 10.1371/journal.pone.0060559. Print 2013.
5
Ab initio and homology based prediction of protein domains by recursive neural networks.利用递归神经网络对蛋白质结构域进行从头预测和基于同源性的预测。
BMC Bioinformatics. 2009 Jun 26;10:195. doi: 10.1186/1471-2105-10-195.
6
DomNet: protein domain boundary prediction using enhanced general regression network and new profiles.DomNet:使用增强型通用回归网络和新轮廓进行蛋白质结构域边界预测
IEEE Trans Nanobioscience. 2008 Jun;7(2):172-81. doi: 10.1109/TNB.2008.2000747.
7
Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines.基于特征选择和支持向量机的域剖面预测域-域相互作用。
BMC Bioinformatics. 2010 Oct 29;11:537. doi: 10.1186/1471-2105-11-537.
8
Protein secondary structure prediction with SPARROW.利用 SPARROW 进行蛋白质二级结构预测。
J Chem Inf Model. 2012 Feb 27;52(2):545-56. doi: 10.1021/ci200321u. Epub 2012 Jan 23.
9
Improved protein secondary structure prediction using support vector machine with a new encoding scheme and an advanced tertiary classifier.使用具有新编码方案和先进三级分类器的支持向量机改进蛋白质二级结构预测。
IEEE Trans Nanobioscience. 2004 Dec;3(4):265-71. doi: 10.1109/tnb.2004.837906.
10
ConDo: protein domain boundary prediction using coevolutionary information.ConDo:利用共进化信息进行蛋白质结构域边界预测。
Bioinformatics. 2019 Jul 15;35(14):2411-2417. doi: 10.1093/bioinformatics/bty973.

引用本文的文献

1
Multi-head attention-based U-Nets for predicting protein domain boundaries using 1D sequence features and 2D distance maps.基于多头注意力的 U-Net 模型,利用 1D 序列特征和 2D 距离图预测蛋白质结构域边界。
BMC Bioinformatics. 2022 Jul 19;23(1):283. doi: 10.1186/s12859-022-04829-1.
2
High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.用于蛋白质结构和功能基因组规模预测的高性能深度学习工具箱。
Workshop Mach Learn HPC Environ. 2021 Nov;2021:46-57. doi: 10.1109/mlhpc54614.2021.00010. Epub 2021 Dec 27.
3
Protein domain identification methods and online resources.

本文引用的文献

1
Ab initio and homology based prediction of protein domains by recursive neural networks.利用递归神经网络对蛋白质结构域进行从头预测和基于同源性的预测。
BMC Bioinformatics. 2009 Jun 26;10:195. doi: 10.1186/1471-2105-10-195.
2
OPUS-Dom: applying the folding-based method VECFOLD to determine protein domain boundaries.OPUS-Dom:应用基于折叠的方法VECFOLD来确定蛋白质结构域边界。
J Mol Biol. 2009 Jan 30;385(4):1314-29. doi: 10.1016/j.jmb.2008.10.093. Epub 2008 Nov 10.
3
Assessment of predictions submitted for the CASP7 domain prediction category.
蛋白质结构域鉴定方法及在线资源。
Comput Struct Biotechnol J. 2021 Feb 2;19:1145-1153. doi: 10.1016/j.csbj.2021.01.041. eCollection 2021.
4
FUpred: detecting protein domains through deep-learning-based contact map prediction.FUpred:基于深度学习的接触图预测的蛋白质结构域检测。
Bioinformatics. 2020 Jun 1;36(12):3749-3757. doi: 10.1093/bioinformatics/btaa217.
5
Modeling the Tertiary Structure of the Rift Valley Fever Virus L Protein.裂谷热病毒 L 蛋白三级结构建模。
Molecules. 2019 May 7;24(9):1768. doi: 10.3390/molecules24091768.
6
DeepDom: Predicting protein domain boundary from sequence alone using stacked bidirectional LSTM.DeepDom:仅使用堆叠双向长短期记忆网络从序列预测蛋白质结构域边界
Pac Symp Biocomput. 2019;24:66-75.
7
Identification and localization of Tospovirus genus-wide conserved residues in 3D models of the nucleocapsid and the silencing suppressor proteins.鉴定和定位核衣壳和沉默抑制蛋白 3D 模型中 Tospovirus 属广泛保守残基。
Virol J. 2019 Jan 11;16(1):7. doi: 10.1186/s12985-018-1106-4.
8
PVP-SVM: Sequence-Based Prediction of Phage Virion Proteins Using a Support Vector Machine.PVP-SVM:使用支持向量机基于序列预测噬菌体病毒粒子蛋白
Front Microbiol. 2018 Mar 16;9:476. doi: 10.3389/fmicb.2018.00476. eCollection 2018.
9
Functional regions of the peroxin Pex19 necessary for peroxisome biogenesis.过氧化物酶体生物发生所必需的过氧化物酶Pex19的功能区域。
J Biol Chem. 2017 Jul 7;292(27):11547-11560. doi: 10.1074/jbc.M116.774067. Epub 2017 May 19.
10
ThreaDomEx: a unified platform for predicting continuous and discontinuous protein domains by multiple-threading and segment assembly.ThreaDomEx:一个通过多线程和片段组装预测连续和不连续蛋白质结构域的统一平台。
Nucleic Acids Res. 2017 Jul 3;45(W1):W400-W407. doi: 10.1093/nar/gkx410.
对提交给CASP7结构域预测类别的预测结果的评估。
Proteins. 2007;69 Suppl 8:137-51. doi: 10.1002/prot.21675.
4
DOMAC: an accurate, hybrid protein domain prediction server.DOMAC:一个准确的混合蛋白质结构域预测服务器。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W354-6. doi: 10.1093/nar/gkm390. Epub 2007 Jun 6.
5
KemaDom: a web server for domain prediction using kernel machine with local context.KemaDom:一个使用带局部上下文的核机器进行结构域预测的网络服务器。
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W158-63. doi: 10.1093/nar/gkl331.
6
Gene fusion/fission is a major contributor to evolution of multi-domain bacterial proteins.基因融合/裂变是多结构域细菌蛋白进化的主要因素。
Bioinformatics. 2006 Jun 15;22(12):1418-23. doi: 10.1093/bioinformatics/btl135. Epub 2006 Apr 6.
7
Computer-aided NMR assay for detecting natively folded structural domains.用于检测天然折叠结构域的计算机辅助核磁共振分析
Protein Sci. 2006 Apr;15(4):871-83. doi: 10.1110/ps.051880406. Epub 2006 Mar 7.
8
Automated prediction of domain boundaries in CASP6 targets using Ginzu and RosettaDOM.使用Ginzu和RosettaDOM自动预测CASP6目标中的结构域边界。
Proteins. 2005;61 Suppl 7:193-200. doi: 10.1002/prot.20737.
9
Improvement of domain linker prediction by incorporating loop-length-dependent characteristics.通过纳入环长度依赖性特征改进结构域连接子预测。
Biopolymers. 2006;84(2):161-8. doi: 10.1002/bip.20361.
10
SCRATCH: a protein structure and structural feature prediction server.SCRATCH:一个蛋白质结构和结构特征预测服务器。
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W72-6. doi: 10.1093/nar/gki396.