通过深度学习网络改进蛋白质折叠识别

Improving Protein Fold Recognition by Deep Learning Networks.

作者信息

Jo Taeho, Hou Jie, Eickholt Jesse, Cheng Jianlin

机构信息

Department of Computer Science, University of Missouri, Columbia, MO 65211, USA.

Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.

出版信息

Sci Rep. 2015 Dec 4;5:17573. doi: 10.1038/srep17573.

DOI:10.1038/srep17573

PMID:26634993

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4669437/

Abstract

For accurate recognition of protein folds, a deep learning network method (DN-Fold) was developed to predict if a given query-template protein pair belongs to the same structural fold. The input used stemmed from the protein sequence and structural features extracted from the protein pair. We evaluated the performance of DN-Fold along with 18 different methods on Lindahl's benchmark dataset and on a large benchmark set extracted from SCOP 1.75 consisting of about one million protein pairs, at three different levels of fold recognition (i.e., protein family, superfamily, and fold) depending on the evolutionary distance between protein sequences. The correct recognition rate of ensembled DN-Fold for Top 1 predictions is 84.5%, 61.5%, and 33.6% and for Top 5 is 91.2%, 76.5%, and 60.7% at family, superfamily, and fold levels, respectively. We also evaluated the performance of single DN-Fold (DN-FoldS), which showed the comparable results at the level of family and superfamily, compared to ensemble DN-Fold. Finally, we extended the binary classification problem of fold recognition to real-value regression task, which also show a promising performance. DN-Fold is freely available through a web server at http://iris.rnet.missouri.edu/dnfold.

摘要

为了准确识别蛋白质折叠，开发了一种深度学习网络方法（DN-Fold），以预测给定的查询模板蛋白质对是否属于相同的结构折叠。所使用的输入源自蛋白质序列以及从蛋白质对中提取的结构特征。我们在林达尔基准数据集以及从SCOP 1.75提取的包含约一百万个蛋白质对的大型基准集上，根据蛋白质序列之间的进化距离，在三种不同的折叠识别水平（即蛋白质家族、超家族和折叠）下，评估了DN-Fold以及18种不同方法的性能。在家族、超家族和折叠水平上，集成DN-Fold的Top 1预测正确识别率分别为84.5%、61.5%和33.6%，Top 5预测正确识别率分别为91.2%、76.5%和60.7%。我们还评估了单个DN-Fold（DN-FoldS）的性能，与集成DN-Fold相比，它在家族和超家族水平上显示出可比的结果。最后，我们将折叠识别的二元分类问题扩展到实值回归任务，其也表现出了有前景的性能。可通过网页服务器http://iris.rnet.missouri.edu/dnfold免费获取DN-Fold。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e76/4669437/4e69804e382a/srep17573-f1.jpg

相似文献

Improving Protein Fold Recognition by Deep Learning Networks.通过深度学习网络改进蛋白质折叠识别

Sci Rep. 2015 Dec 4;5:17573. doi: 10.1038/srep17573.

SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法：一种用于判别式多类别蛋白质折叠和超家族识别的工具。

BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.

A machine learning information retrieval approach to protein fold recognition.一种用于蛋白质折叠识别的机器学习信息检索方法。

Bioinformatics. 2006 Jun 15;22(12):1456-63. doi: 10.1093/bioinformatics/btl102. Epub 2006 Mar 17.

Improving taxonomy-based protein fold recognition by using global and local features.利用全局和局部特征改进基于分类法的蛋白质折叠识别。

Proteins. 2011 Jul;79(7):2053-64. doi: 10.1002/prot.23025. Epub 2011 May 2.

Protein fold recognition using the gradient boost algorithm.使用梯度提升算法进行蛋白质折叠识别。

Comput Syst Bioinformatics Conf. 2006:43-53.

DeepSF: deep convolutional neural network for mapping protein sequences to folds.DeepSF：一种将蛋白质序列映射到折叠结构的深度卷积神经网络。

Bioinformatics. 2018 Apr 15;34(8):1295-1303. doi: 10.1093/bioinformatics/btx780.

Improving protein fold recognition by random forest.通过随机森林改进蛋白质折叠识别

BMC Bioinformatics. 2014;15 Suppl 11(Suppl 11):S14. doi: 10.1186/1471-2105-15-S11-S14. Epub 2014 Oct 21.

A fast SCOP fold classification system using content-based E-Predict algorithm.一种使用基于内容的E-Predict算法的快速SCOP折叠分类系统。

BMC Bioinformatics. 2006 Jul 26;7:362. doi: 10.1186/1471-2105-7-362.

AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.AutoSCOP：使用独特的模式-类别映射自动预测SCOP分类

Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22.

DescFold: a web server for protein fold recognition.DescFold：用于蛋白质折叠识别的网络服务器。

BMC Bioinformatics. 2009 Dec 14;10:416. doi: 10.1186/1471-2105-10-416.

引用本文的文献

The role of artificial intelligence in drug screening, drug design, and clinical trials.人工智能在药物筛选、药物设计和临床试验中的作用。

Front Pharmacol. 2024 Nov 29;15:1459954. doi: 10.3389/fphar.2024.1459954. eCollection 2024.

BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo.BioSeq-Diabolo：使用 Diabolo 进行生物序列相似性分析。

PLoS Comput Biol. 2023 Jun 20;19(6):e1011214. doi: 10.1371/journal.pcbi.1011214. eCollection 2023 Jun.

Protein Function Analysis through Machine Learning.基于机器学习的蛋白质功能分析。

Biomolecules. 2022 Sep 6;12(9):1246. doi: 10.3390/biom12091246.

Local Alignment of DNA Sequence Based on Deep Reinforcement Learning.基于深度强化学习的DNA序列局部比对

IEEE Open J Eng Med Biol. 2021 Apr 27;2:170-178. doi: 10.1109/OJEMB.2021.3076156. eCollection 2021.

FoldHSphere: deep hyperspherical embeddings for protein fold recognition.FoldHSphere：用于蛋白质折叠识别的深度超球嵌入。

BMC Bioinformatics. 2021 Oct 12;22(1):490. doi: 10.1186/s12859-021-04419-7.

Improving protein fold recognition using triplet network and ensemble deep learning.利用三重网络和集成深度学习提高蛋白质折叠识别。

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab248.

Evolving scenario of big data and Artificial Intelligence (AI) in drug discovery.大数据和人工智能（AI）在药物发现中的发展态势。

Mol Divers. 2021 Aug;25(3):1439-1460. doi: 10.1007/s11030-021-10256-w. Epub 2021 Jun 23.

Applications of artificial intelligence to drug design and discovery in the big data era: a comprehensive review.人工智能在大数据时代在药物设计和发现中的应用：全面综述。

Mol Divers. 2021 Aug;25(3):1643-1664. doi: 10.1007/s11030-021-10237-z. Epub 2021 Jun 10.

A protein structural study based on the centrality analysis of protein sequence feature networks.基于蛋白质序列特征网络中心性分析的蛋白质结构研究。

PLoS One. 2021 Mar 29;16(3):e0248861. doi: 10.1371/journal.pone.0248861. eCollection 2021.

Why can deep convolutional neural networks improve protein fold recognition? A visual explanation by interpretation.为什么深度卷积神经网络能够提高蛋白质折叠识别能力？通过解释进行可视化分析。

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab001.

本文引用的文献

Improving protein fold recognition by random forest.通过随机森林改进蛋白质折叠识别

BMC Bioinformatics. 2014;15 Suppl 11(Suppl 11):S14. doi: 10.1186/1471-2105-15-S11-S14. Epub 2014 Oct 21.

Centenary Award and Sir Frederick Gowland Hopkins Memorial Lecture. Protein folding, structure prediction and design.百年奖及弗雷德里克· Gowland Hopkins爵士纪念讲座。蛋白质折叠、结构预测与设计。

Biochem Soc Trans. 2014 Apr;42(2):225-9. doi: 10.1042/BST20130055.

A study and benchmark of DNcon: a method for protein residue-residue contact prediction using deep networks.DNcon：一种使用深度网络进行蛋白质残基残基接触预测的方法的研究和基准测试。

BMC Bioinformatics. 2013;14 Suppl 14(Suppl 14):S12. doi: 10.1186/1471-2105-14-S14-S12. Epub 2013 Oct 9.

DNdisorder: predicting protein disorder using boosting and deep networks.DNdisorder：使用提升和深度网络预测蛋白质无序性。

BMC Bioinformatics. 2013 Mar 6;14:88. doi: 10.1186/1471-2105-14-88.

Predicting protein residue-residue contacts using deep networks and boosting.利用深度网络和提升技术预测蛋白质残基残基接触

Bioinformatics. 2012 Dec 1;28(23):3066-72. doi: 10.1093/bioinformatics/bts598. Epub 2012 Oct 9.

Boosting Protein Threading Accuracy.提高蛋白质穿线法的准确性。

Res Comput Mol Biol. 2009;5541:31-45. doi: 10.1007/978-3-642-02008-7_3.

Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates.通过预测查询的一维结构特性与模板的相应天然特性之间的基于概率的匹配，提高蛋白质折叠识别和基于模板的建模。

Bioinformatics. 2011 Aug 1;27(15):2076-82. doi: 10.1093/bioinformatics/btr350. Epub 2011 Jun 11.

SP5: improving protein fold recognition by using torsion angle profiles and profile-based gap penalty model.SP5：通过使用扭转角轮廓和基于轮廓的空位罚分模型改进蛋白质折叠识别。

PLoS One. 2008 Jun 4;3(6):e2325. doi: 10.1371/journal.pone.0002325.

The universal protein resource (UniProt).通用蛋白质资源（UniProt）。

Nucleic Acids Res. 2008 Jan;36(Database issue):D190-5. doi: 10.1093/nar/gkm895. Epub 2007 Nov 27.

Fold recognition by concurrent use of solvent accessibility and residue depth.通过同时使用溶剂可及性和残基深度进行折叠识别。

Proteins. 2007 Aug 15;68(3):636-45. doi: 10.1002/prot.21459.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过深度学习网络改进蛋白质折叠识别

Improving Protein Fold Recognition by Deep Learning Networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献