Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St Louis, United States.
Center for Science and Engineering Living Systems, Washington University, St Louis, United States.
Elife. 2021 Sep 17;10:e70576. doi: 10.7554/eLife.70576.
The rise of high-throughput experiments has transformed how scientists approach biological questions. The ubiquity of large-scale assays that can test thousands of samples in a day has necessitated the development of new computational approaches to interpret this data. Among these tools, machine learning approaches are increasingly being utilized due to their ability to infer complex nonlinear patterns from high-dimensional data. Despite their effectiveness, machine learning (and in particular deep learning) approaches are not always accessible or easy to implement for those with limited computational expertise. Here we present PARROT, a general framework for training and applying deep learning-based predictors on large protein datasets. Using an internal recurrent neural network architecture, PARROT is capable of tackling both classification and regression tasks while only requiring raw protein sequences as input. We showcase the potential uses of PARROT on three diverse machine learning tasks: predicting phosphorylation sites, predicting transcriptional activation function of peptides generated by high-throughput reporter assays, and predicting the fibrillization propensity of amyloid beta with data generated by deep mutational scanning. Through these examples, we demonstrate that PARROT is easy to use, performs comparably to state-of-the-art computational tools, and is applicable for a wide array of biological problems.
高通量实验的兴起改变了科学家们研究生物学问题的方式。由于每天可以测试数千个样本的大规模分析变得无处不在,因此需要开发新的计算方法来解释这些数据。在这些工具中,由于机器学习方法能够从高维数据中推断复杂的非线性模式,因此越来越多地被使用。尽管它们非常有效,但对于那些计算专业知识有限的人来说,机器学习(尤其是深度学习)方法并不总是易于访问或易于实现。在这里,我们提出了 PARROT,这是一个用于在大型蛋白质数据集上训练和应用基于深度学习的预测器的通用框架。PARROT 使用内部递归神经网络架构,能够处理分类和回归任务,同时仅需要原始蛋白质序列作为输入。我们在三个不同的机器学习任务上展示了 PARROT 的潜在用途:预测磷酸化位点、预测高通量报告实验生成的肽的转录激活功能,以及使用深度突变扫描生成的数据预测淀粉样蛋白β的纤维化倾向。通过这些例子,我们证明了 PARROT 易于使用,与最先进的计算工具相比性能相当,并且适用于广泛的生物学问题。