NetSurfP-3.0:通过蛋白质语言模型和深度学习实现蛋白质结构特征的准确快速预测。

NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning.

机构信息

Department of Health Technology, Technical University of Denmark, DK Lyngby, Denmark.

Center for Evolutionary Hologenomics, GLOBE Institute, University of Copenhagen, Denmark.

出版信息

Nucleic Acids Res. 2022 Jul 5;50(W1):W510-W515. doi: 10.1093/nar/gkac439.

Abstract

Recent advances in machine learning and natural language processing have made it possible to profoundly advance our ability to accurately predict protein structures and their functions. While such improvements are significantly impacting the fields of biology and biotechnology at large, such methods have the downside of high demands in terms of computing power and runtime, hampering their applicability to large datasets. Here, we present NetSurfP-3.0, a tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence. This NetSurfP update exploits recent advances in pre-trained protein language models to drastically improve the runtime of its predecessor by two orders of magnitude, while displaying similar prediction performance. We assessed the accuracy of NetSurfP-3.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features, with a runtime that is up to to 600 times faster than the most commonly available methods performing the same tasks. The tool is freely available as a web server with a user-friendly interface to navigate the results, as well as a standalone downloadable package.

摘要

最近,机器学习和自然语言处理领域的进展使得我们能够更精确地预测蛋白质结构和功能。虽然这些改进对生物学和生物技术领域产生了重大影响,但这些方法在计算能力和运行时间方面要求很高,限制了它们在大型数据集上的应用。在这里,我们介绍了 NetSurfP-3.0,这是一种用于预测溶剂可及性、二级结构、结构无序和每个氨基酸序列残基的骨架二面角的工具。这个 NetSurfP 更新利用了预先训练的蛋白质语言模型的最新进展,将其前代的运行时间提高了两个数量级,同时显示出类似的预测性能。我们在几个独立的测试数据集上评估了 NetSurfP-3.0 的准确性,发现它对其每个输出特征的预测都达到了最新水平,运行时间比执行相同任务的最常用方法快 600 多倍。该工具作为一个带有用户友好界面的网络服务器免费提供,可用于浏览结果,以及一个可下载的独立软件包。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f5c5/9252760/3d93c192e5fb/gkac439figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索