Suppr
超能文献

eThread：一种高度优化的基于机器学习的元线程和蛋白质三级结构建模方法。

eThread: a highly optimized machine learning-based approach to meta-threading and the modeling of protein tertiary structures.

机构信息

Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, United States of America.

出版信息

PLoS One. 2012;7(11):e50200. doi: 10.1371/journal.pone.0050200. Epub 2012 Nov 21.

DOI:10.1371/journal.pone.0050200

PMID:23185577

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3503980/

Abstract

Template-based modeling that employs various meta-threading techniques is currently the most accurate, and consequently the most commonly used, approach for protein structure prediction. Despite the evident progress in this field, accurate structure models cannot be constructed for a significant fraction of gene products, thus the development of new algorithms is required. Here, we describe the development, optimization and large-scale benchmarking of eThread, a highly accurate meta-threading procedure for the identification of structural templates and the construction of corresponding target-to-template alignments. eThread integrates ten state-of-the-art threading/fold recognition algorithms in a local environment and extensively uses various machine learning techniques to carry out fully automated template-based protein structure modeling. Tertiary structure prediction employs two protocols based on widely used modeling algorithms: Modeller and TASSER-Lite. As a part of eThread, we also developed eContact, which is a Bayesian classifier for the prediction of inter-residue contacts and eRank, which effectively ranks generated multiple protein models and provides reliable confidence estimates as structure quality assessment. Excluding closely related templates from the modeling process, eThread generates models, which are correct at the fold level, for >80% of the targets; 40-50% of the constructed models are of a very high quality, which would be considered accurate at the family level. Furthermore, in large-scale benchmarking, we compare the performance of eThread to several alternative methods commonly used in protein structure prediction. Finally, we estimate the upper bound for this type of approach and discuss the directions towards further improvements.

摘要

基于模板的建模采用了各种元线程技术，是目前最准确的，也是最常用的蛋白质结构预测方法。尽管在这个领域取得了明显的进展，但对于很大一部分基因产物，仍然无法构建准确的结构模型，因此需要开发新的算法。在这里，我们描述了 eThread 的开发、优化和大规模基准测试，eThread 是一种用于识别结构模板和构建相应目标到模板比对的高度准确的元线程程序。eThread 在本地环境中集成了十种最先进的线程/折叠识别算法，并广泛使用各种机器学习技术来进行全自动基于模板的蛋白质结构建模。三级结构预测采用基于广泛使用的建模算法的两种协议：Modeller 和 TASSER-Lite。作为 eThread 的一部分，我们还开发了 eContact，这是一种用于预测残基间接触的贝叶斯分类器，以及 eRank，它有效地对生成的多个蛋白质模型进行排序，并提供可靠的置信度估计作为结构质量评估。eThread 在建模过程中排除了密切相关的模板，为>80%的目标生成了在折叠水平上正确的模型；40-50%的构建模型具有非常高的质量，这在家族水平上被认为是准确的。此外，在大规模基准测试中，我们将 eThread 的性能与蛋白质结构预测中常用的几种替代方法进行了比较。最后，我们估计了这种方法的上限，并讨论了进一步改进的方向。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eeca/3503980/f3e15530d630/pone.0050200.g001.jpg

相似文献

eThread: a highly optimized machine learning-based approach to meta-threading and the modeling of protein tertiary structures.

PLoS One. 2012;7(11):e50200. doi: 10.1371/journal.pone.0050200. Epub 2012 Nov 21.

Template-based protein structure prediction in CASP11 and retrospect of I-TASSER in the last decade.

Proteins. 2016 Sep;84 Suppl 1(Suppl 1):233-46. doi: 10.1002/prot.24918. Epub 2015 Sep 18.

The utility of artificially evolved sequences in protein threading and fold recognition.

J Theor Biol. 2013 Jul 7;328:77-88. doi: 10.1016/j.jtbi.2013.03.018. Epub 2013 Mar 27.

A comprehensive assessment of sequence-based and template-based methods for protein contact prediction.

Bioinformatics. 2008 Apr 1;24(7):924-31. doi: 10.1093/bioinformatics/btn069. Epub 2008 Feb 22.

Prediction of protein-protein interaction sites from weakly homologous template structures using meta-threading and machine learning.

J Mol Recognit. 2015 Jan;28(1):35-48. doi: 10.1002/jmr.2410.

TASSER: an automated method for the prediction of protein tertiary structures in CASP6.

Proteins. 2005;61 Suppl 7:91-98. doi: 10.1002/prot.20724.

Unleashing the power of meta-threading for evolution/structure-based function inference of proteins.

Front Genet. 2013 Jun 19;4:118. doi: 10.3389/fgene.2013.00118. eCollection 2013.

Effect of using suboptimal alignments in template-based protein structure prediction.

Proteins. 2011 Jan;79(1):315-34. doi: 10.1002/prot.22885.

Benchmarking of TASSER_2.0: an improved protein structure prediction algorithm with more accurate predicted contact restraints.

Biophys J. 2008 Aug;95(4):1956-64. doi: 10.1529/biophysj.108.129759. Epub 2008 May 16.

Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm.

Proteins. 2004 Aug 15;56(3):502-18. doi: 10.1002/prot.20106.

引用本文的文献

Updates to Binding MOAD (Mother of All Databases): Polypharmacology Tools and Their Utility in Drug Repurposing.

J Mol Biol. 2019 Jun 14;431(13):2423-2433. doi: 10.1016/j.jmb.2019.05.024. Epub 2019 May 22.

Elucidating the druggability of the human proteome with eFindSite.

J Comput Aided Mol Des. 2019 May;33(5):509-519. doi: 10.1007/s10822-019-00197-w. Epub 2019 Mar 19.

Binding site matching in rational drug design: algorithms and applications.

Brief Bioinform. 2019 Nov 27;20(6):2167-2184. doi: 10.1093/bib/bby078.

eModel-BDB: a database of comparative structure models of drug-target interactions from the Binding Database.

Gigascience. 2018 Aug 1;7(8):giy091. doi: 10.1093/gigascience/giy091.

Large-scale computational drug repositioning to find treatments for rare diseases.

NPJ Syst Biol Appl. 2018 Mar 13;4:13. doi: 10.1038/s41540-018-0050-7. eCollection 2018.

eRepo-ORP: Exploring the Opportunity Space to Combat Orphan Diseases with Existing Drugs.

J Mol Biol. 2018 Jul 20;430(15):2266-2273. doi: 10.1016/j.jmb.2017.12.001. Epub 2017 Dec 10.

Across-proteome modeling of dimer structures for the bottom-up assembly of protein-protein interaction networks.

BMC Bioinformatics. 2017 May 12;18(1):257. doi: 10.1186/s12859-017-1675-z.

Deletion of a Predicted β-Sheet Domain within the Amino Terminus of Herpes Simplex Virus Glycoprotein K Conserved among Alphaherpesviruses Prevents Virus Entry into Neuronal Axons.

J Virol. 2015 Dec 9;90(5):2230-9. doi: 10.1128/JVI.02468-15.

Predicted binding site information improves model ranking in protein docking using experimental and computer-generated target structures.

BMC Struct Biol. 2015 Nov 23;15:23. doi: 10.1186/s12900-015-0050-4.

PDID: database of molecular-level putative protein-drug interactions in the structural human proteome.

Bioinformatics. 2016 Feb 15;32(4):579-86. doi: 10.1093/bioinformatics/btv597. Epub 2015 Oct 26.

本文引用的文献

All-atom empirical potential for molecular modeling and dynamics studies of proteins.

J Phys Chem B. 1998 Apr 30;102(18):3586-616. doi: 10.1021/jp973084f.

Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field.

Proteins. 2012 Jul;80(7):1715-35. doi: 10.1002/prot.24065. Epub 2012 Apr 13.

Further evidence for the likely completeness of the library of solved single domain protein structures.

J Phys Chem B. 2012 Jun 14;116(23):6654-64. doi: 10.1021/jp211052j. Epub 2012 Feb 13.

HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Nat Methods. 2011 Dec 25;9(2):173-5. doi: 10.1038/nmeth.1818.

Critical assessment of methods of protein structure prediction (CASP)--round IX.

Proteins. 2011;79 Suppl 10(0 10):1-5. doi: 10.1002/prot.23200. Epub 2011 Oct 14.

Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates.

Bioinformatics. 2011 Aug 1;27(15):2076-82. doi: 10.1093/bioinformatics/btr350. Epub 2011 Jun 11.

APOLLO: a quality assessment service for single and multiple protein models.

Bioinformatics. 2011 Jun 15;27(12):1715-6. doi: 10.1093/bioinformatics/btr268. Epub 2011 May 5.

The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction.

Nucleic Acids Res. 2011 Jul;39(Web Server issue):W171-6. doi: 10.1093/nar/gkr184. Epub 2011 Mar 31.

Advances in whole genome sequencing technology.

Curr Pharm Biotechnol. 2011 Feb 1;12(2):293-305. doi: 10.2174/138920111794295729.

Low-homology protein threading.

Bioinformatics. 2010 Jun 15;26(12):i294-300. doi: 10.1093/bioinformatics/btq192.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

eThread：一种高度优化的基于机器学习的元线程和蛋白质三级结构建模方法。

eThread: a highly optimized machine learning-based approach to meta-threading and the modeling of protein tertiary structures.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译