Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri; Center for Science and Engineering Living Systems (CSELS), St. Louis, Missouri; Center for Engineering Mechanobiology, Washington University, St. Louis, Missouri.
Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri; Center for Science and Engineering Living Systems (CSELS), St. Louis, Missouri.
Biophys J. 2021 Oct 19;120(20):4312-4319. doi: 10.1016/j.bpj.2021.08.039. Epub 2021 Sep 2.
Intrinsically disordered proteins and protein regions make up a substantial fraction of many proteomes in which they play a wide variety of essential roles. A critical first step in understanding the role of disordered protein regions in biological function is to identify those disordered regions correctly. Computational methods for disorder prediction have emerged as a core set of tools to guide experiments, interpret results, and develop hypotheses. Given the multiple different predictors available, consensus scores have emerged as a popular approach to mitigate biases or limitations of any single method. Consensus scores integrate the outcome of multiple independent disorder predictors and provide a per-residue value that reflects the number of tools that predict a residue to be disordered. Although consensus scores help mitigate the inherent problems of using any single disorder predictor, they are computationally expensive to generate. They also necessitate the installation of multiple different software tools, which can be prohibitively difficult. To address this challenge, we developed a deep-learning-based predictor of consensus disorder scores. Our predictor, metapredict, utilizes a bidirectional recurrent neural network trained on the consensus disorder scores from 12 proteomes. By benchmarking metapredict using two orthogonal approaches, we found that metapredict is among the most accurate disorder predictors currently available. Metapredict is also remarkably fast, enabling proteome-scale disorder prediction in minutes. Importantly, metapredict is a fully open source and is distributed as a Python package, a collection of command-line tools, and a web server, maximizing the potential practical utility of the predictor. We believe metapredict offers a convenient, accessible, accurate, and high-performance predictor for single-proteins and proteomes alike.
无规蛋白和无规蛋白区域构成了许多蛋白质组的重要组成部分,它们在其中发挥着广泛的重要作用。理解无规蛋白区域在生物学功能中的作用的关键第一步是正确识别那些无规区域。无序预测的计算方法已经成为指导实验、解释结果和提出假说的核心工具集。鉴于有多种不同的预测器可用,共识评分已成为一种流行的方法,可以减轻任何单一方法的偏差或局限性。共识评分整合了多个独立无序预测器的结果,并提供了一个反映预测残基无序的工具数量的残基值。虽然共识评分有助于减轻使用任何单一无序预测器所固有的问题,但它们的生成计算成本很高。它们还需要安装多个不同的软件工具,这可能非常困难。为了解决这个挑战,我们开发了一种基于深度学习的共识无序评分预测器。我们的预测器 metapredict 使用双向递归神经网络,基于来自 12 个蛋白质组的共识无序评分进行训练。通过使用两种正交方法对 metapredict 进行基准测试,我们发现 metapredict 是目前最准确的无序预测器之一。metapredict 也非常快速,能够在几分钟内进行蛋白质组规模的无序预测。重要的是,metapredict 是完全开源的,并作为 Python 包、一组命令行工具和一个 Web 服务器分发,最大限度地提高了预测器的实际实用价值。我们相信 metapredict 为单蛋白和蛋白质组提供了一种方便、可访问、准确和高性能的预测器。