• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

DescFold:用于蛋白质折叠识别的网络服务器。

DescFold: a web server for protein fold recognition.

机构信息

State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China.

出版信息

BMC Bioinformatics. 2009 Dec 14;10:416. doi: 10.1186/1471-2105-10-416.

DOI:10.1186/1471-2105-10-416
PMID:20003426
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2803855/
Abstract

BACKGROUND

Machine learning-based methods have been proven to be powerful in developing new fold recognition tools. In our previous work [Zhang, Kochhar and Grigorov (2005) Protein Science, 14: 431-444], a machine learning-based method called DescFold was established by using Support Vector Machines (SVMs) to combine the following four descriptors: a profile-sequence-alignment-based descriptor using Psi-blast e-values and bit scores, a sequence-profile-alignment-based descriptor using Rps-blast e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. In this work, we focus on the improvement of DescFold by incorporating more powerful descriptors and setting up a user-friendly web server.

RESULTS

In seeking more powerful descriptors, the profile-profile alignment score generated from the COMPASS algorithm was first considered as a new descriptor (i.e., PPA). When considering a profile-profile alignment between two proteins in the context of fold recognition, one protein is regarded as a template (i.e., its 3D structure is known). Instead of a sequence profile derived from a Psi-blast search, a structure-seeded profile for the template protein was generated by searching its structural neighbors with the assistance of the TM-align structural alignment algorithm. Moreover, the COMPASS algorithm was used again to derive a profile-structural-profile-alignment-based descriptor (i.e., PSPA). We trained and tested the new DescFold in a total of 1,835 highly diverse proteins extracted from the SCOP 1.73 version. When the PPA and PSPA descriptors were introduced, the new DescFold boosts the performance of fold recognition substantially. Using the SCOP_1.73_40% dataset as the fold library, the DescFold web server based on the trained SVM models was further constructed. To provide a large-scale test for the new DescFold, a stringent test set of 1,866 proteins were selected from the SCOP 1.75 version. At a less than 5% false positive rate control, the new DescFold is able to correctly recognize structural homologs at the fold level for nearly 46% test proteins. Additionally, we also benchmarked the DescFold method against several well-established fold recognition algorithms through the LiveBench targets and Lindahl dataset.

CONCLUSIONS

The new DescFold method was intensively benchmarked to have very competitive performance compared with some well-established fold recognition methods, suggesting that it can serve as a useful tool to assist in template-based protein structure prediction. The DescFold server is freely accessible at http://202.112.170.199/DescFold/index.html.

摘要

背景

基于机器学习的方法已被证明在开发新的折叠识别工具方面非常有效。在我们之前的工作中[Zhang、Kochhar 和 Grigorov(2005)《蛋白质科学》,14:431-444],我们使用支持向量机(SVM)建立了一种基于机器学习的方法 DescFold,该方法结合了以下四个描述符:基于 Psi-blast e 值和位得分的序列-结构比对描述符、基于 Rps-blast e 值和位得分的结构-序列比对描述符、基于二级结构元素比对(SSEA)的描述符以及基于 PROSITE 功能基序的描述符。在这项工作中,我们专注于通过结合更强大的描述符并建立一个用户友好的网络服务器来改进 DescFold。

结果

为了寻求更强大的描述符,我们首先考虑了 COMPASS 算法生成的结构-结构比对得分作为新的描述符(即 PPA)。在折叠识别的背景下考虑两个蛋白质之间的结构-结构比对时,一个蛋白质被视为模板(即其 3D 结构是已知的)。而不是从 Psi-blast 搜索中获得序列结构,而是通过 TM-align 结构比对算法搜索其结构邻居,为模板蛋白生成基于结构的结构谱。此外,我们再次使用 COMPASS 算法来获得基于结构-结构-结构比对的描述符(即 PSPA)。我们在总共 1835 种高度多样化的蛋白质中对新的 DescFold 进行了训练和测试,这些蛋白质是从 SCOP 1.73 版本中提取的。当引入 PPA 和 PSPA 描述符时,新的 DescFold 大大提高了折叠识别的性能。使用 SCOP_1.73_40%数据集作为折叠库,我们进一步构建了基于训练的 SVM 模型的 DescFold 网络服务器。为了对新的 DescFold 进行大规模测试,我们从 SCOP 1.75 版本中选择了 1866 种蛋白质作为严格测试集。在控制假阳性率低于 5%的情况下,新的 DescFold 能够正确识别结构同源物,约有近 46%的测试蛋白质属于折叠水平。此外,我们还通过 LiveBench 目标和 Lindahl 数据集将 DescFold 方法与几个成熟的折叠识别算法进行了基准测试。

结论

新的 DescFold 方法经过了密集的基准测试,与一些成熟的折叠识别方法相比具有非常有竞争力的性能,这表明它可以作为一种有用的工具,辅助基于模板的蛋白质结构预测。DescFold 服务器可在 http://202.112.170.199/DescFold/index.html 免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a6a/2803855/1e7b2c16067c/1471-2105-10-416-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a6a/2803855/3d282e4dae51/1471-2105-10-416-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a6a/2803855/a630f4d7b42f/1471-2105-10-416-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a6a/2803855/f52f3a7d7b9b/1471-2105-10-416-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a6a/2803855/ab14ff67323e/1471-2105-10-416-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a6a/2803855/1e7b2c16067c/1471-2105-10-416-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a6a/2803855/3d282e4dae51/1471-2105-10-416-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a6a/2803855/a630f4d7b42f/1471-2105-10-416-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a6a/2803855/f52f3a7d7b9b/1471-2105-10-416-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a6a/2803855/ab14ff67323e/1471-2105-10-416-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2a6a/2803855/1e7b2c16067c/1471-2105-10-416-5.jpg

相似文献

1
DescFold: a web server for protein fold recognition.DescFold:用于蛋白质折叠识别的网络服务器。
BMC Bioinformatics. 2009 Dec 14;10:416. doi: 10.1186/1471-2105-10-416.
2
Descriptor-based protein remote homology identification.基于描述符的蛋白质远程同源性鉴定。
Protein Sci. 2005 Feb;14(2):431-44. doi: 10.1110/ps.041035505. Epub 2005 Jan 4.
3
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
4
TIM-Finder: a new method for identifying TIM-barrel proteins.TIM-Finder:一种鉴定TIM桶状蛋白的新方法。
BMC Struct Biol. 2009 Dec 14;9:73. doi: 10.1186/1472-6807-9-73.
5
Application of nonnegative matrix factorization to improve profile-profile alignment features for fold recognition and remote homolog detection.非负矩阵分解在改善用于折叠识别和远程同源物检测的轮廓-轮廓比对特征方面的应用。
BMC Bioinformatics. 2008 Jul 1;9:298. doi: 10.1186/1471-2105-9-298.
6
Fold recognition by combining profile-profile alignment and support vector machine.通过结合轮廓-轮廓比对和支持向量机进行折叠识别。
Bioinformatics. 2005 Jun 1;21(11):2667-73. doi: 10.1093/bioinformatics/bti384. Epub 2005 Mar 15.
7
A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction.用于蛋白质结构预测的20种代表性序列比对方法的比较评估与分析。
Sci Rep. 2013;3:2619. doi: 10.1038/srep02619.
8
The SSEA server for protein secondary structure alignment.用于蛋白质二级结构比对的SSEA服务器。
Bioinformatics. 2005 Feb 1;21(3):393-5. doi: 10.1093/bioinformatics/bti013. Epub 2004 Sep 3.
9
FFAS03: a server for profile--profile sequence alignments.FFAS03:一个用于蛋白质序列与蛋白质序列比对的服务器。
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W284-8. doi: 10.1093/nar/gki418.
10
Template based protein structure modeling by global optimization in CASP11.在蛋白质结构预测技术关键评估第11轮(CASP11)中基于模板的蛋白质结构全局优化建模
Proteins. 2016 Sep;84 Suppl 1:221-32. doi: 10.1002/prot.24917. Epub 2015 Sep 14.

引用本文的文献

1
Recognition of Protein Pupylation Sites by Adopting Resampling Approach.采用重采样方法识别蛋白泛素化位点。
Molecules. 2018 Nov 27;23(12):3097. doi: 10.3390/molecules23123097.
2
SVM-SulfoSite: A support vector machine based predictor for sulfenylation sites.SVM-SulfoSite:一种基于支持向量机的巯基化位点预测器。
Sci Rep. 2018 Jul 26;8(1):11288. doi: 10.1038/s41598-018-29126-x.
3
EPuL: An Enhanced Positive-Unlabeled Learning Algorithm for the Prediction of Pupylation Sites.EPuL:一种用于预测泛素化位点的增强型正未标记学习算法

本文引用的文献

1
SP5: improving protein fold recognition by using torsion angle profiles and profile-based gap penalty model.SP5:通过使用扭转角轮廓和基于轮廓的空位罚分模型改进蛋白质折叠识别。
PLoS One. 2008 Jun 4;3(6):e2325. doi: 10.1371/journal.pone.0002325.
2
MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information.MUSTER:通过使用多种结构信息源改进蛋白质序列轮廓-轮廓比对。
Proteins. 2008 Aug;72(2):547-56. doi: 10.1002/prot.21945.
3
PFRES: protein fold classification by using evolutionary information and predicted secondary structure.
Molecules. 2017 Sep 5;22(9):1463. doi: 10.3390/molecules22091463.
4
Computational methods in drug discovery.药物发现中的计算方法。
Beilstein J Org Chem. 2016 Dec 12;12:2694-2718. doi: 10.3762/bjoc.12.267. eCollection 2016.
5
ProFold: Protein Fold Classification with Additional Structural Features and a Novel Ensemble Classifier.ProFold:结合额外结构特征与新型集成分类器的蛋白质折叠分类
Biomed Res Int. 2016;2016:6802832. doi: 10.1155/2016/6802832. Epub 2016 Aug 28.
6
Impact of structure space continuity on protein fold classification.结构空间连续性对蛋白质折叠分类的影响。
Sci Rep. 2016 Mar 23;6:23263. doi: 10.1038/srep23263.
7
Computational Identification of Protein Pupylation Sites by Using Profile-Based Composition of k-Spaced Amino Acid Pairs.基于k间隔氨基酸对的轮廓组成对蛋白质泛素样修饰位点进行计算识别
PLoS One. 2015 Jun 16;10(6):e0129635. doi: 10.1371/journal.pone.0129635. eCollection 2015.
8
SUMOhydro: a novel method for the prediction of sumoylation sites based on hydrophobic properties.SUMOhydro:一种基于疏水性的预测 SUMO 化位点的新方法。
PLoS One. 2012;7(6):e39195. doi: 10.1371/journal.pone.0039195. Epub 2012 Jun 14.
9
Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs.基于 k 间隔氨基酸对组成的泛素化位点预测。
PLoS One. 2011;6(7):e22930. doi: 10.1371/journal.pone.0022930. Epub 2011 Jul 29.
10
Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence.PROFEAT 更新:一个用于从氨基酸序列计算蛋白质和肽的结构和物理化学特征的网络服务器。
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W385-90. doi: 10.1093/nar/gkr284. Epub 2011 May 23.
PFRES:利用进化信息和预测的二级结构进行蛋白质折叠分类
Bioinformatics. 2007 Nov 1;23(21):2843-50. doi: 10.1093/bioinformatics/btm475. Epub 2007 Oct 17.
4
Critical assessment of methods of protein structure prediction-Round VII.蛋白质结构预测方法的批判性评估——第七轮。
Proteins. 2007;69 Suppl 8(S8):3-9. doi: 10.1002/prot.21767.
5
COMPASS server for remote homology inference.用于远程同源性推断的COMPASS服务器。
Nucleic Acids Res. 2007 Jul;35(Web Server issue):W653-8. doi: 10.1093/nar/gkm293. Epub 2007 May 21.
6
Fold recognition by concurrent use of solvent accessibility and residue depth.通过同时使用溶剂可及性和残基深度进行折叠识别。
Proteins. 2007 Aug 15;68(3):636-45. doi: 10.1002/prot.21459.
7
AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.AutoSCOP:使用独特的模式-类别映射自动预测SCOP分类
Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22.
8
Building multiclass classifiers for remote homology detection and fold recognition.构建用于远程同源性检测和折叠识别的多类分类器。
BMC Bioinformatics. 2006 Oct 16;7:455. doi: 10.1186/1471-2105-7-455.
9
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.Cd-hit:一个用于对大量蛋白质或核苷酸序列进行聚类和比较的快速程序。
Bioinformatics. 2006 Jul 1;22(13):1658-9. doi: 10.1093/bioinformatics/btl158. Epub 2006 May 26.
10
Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching.使用受试者工作特征(ROC)分析来评估序列匹配。
Comput Chem. 1996 Mar;20(1):25-33. doi: 10.1016/s0097-8485(96)80004-0.