Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO, 63132, USA.
Sci Rep. 2020 Aug 7;10(1):13374. doi: 10.1038/s41598-020-70181-0.
As deep learning algorithms drive the progress in protein structure prediction, a lot remains to be studied at this merging superhighway of deep learning and protein structure prediction. Recent findings show that inter-residue distance prediction, a more granular version of the well-known contact prediction problem, is a key to predicting accurate models. However, deep learning methods that predict these distances are still in the early stages of their development. To advance these methods and develop other novel methods, a need exists for a small and representative dataset packaged for faster development and testing. In this work, we introduce protein distance net (PDNET), a framework that consists of one such representative dataset along with the scripts for training and testing deep learning methods. The framework also includes all the scripts that were used to curate the dataset, and generate the input features and distance maps. Deep learning models can also be trained and tested in a web browser using free platforms such as Google Colab. We discuss how PDNET can be used to predict contacts, distance intervals, and real-valued distances.
随着深度学习算法推动蛋白质结构预测的发展,在深度学习和蛋白质结构预测的这个交汇超级高速公路上,仍有许多问题有待研究。最近的研究结果表明,残差距离预测(一种更细粒度的著名接触预测问题)是预测准确模型的关键。然而,预测这些距离的深度学习方法仍处于发展的早期阶段。为了推进这些方法和开发其他新方法,需要一个小型的、有代表性的数据集,以便于更快地开发和测试。在这项工作中,我们引入了蛋白质距离网络(PDNET),这是一个框架,其中包含一个这样的代表性数据集以及用于训练和测试深度学习方法的脚本。该框架还包含了用于整理数据集、生成输入特征和距离图的所有脚本。深度学习模型也可以在 Google Colab 等免费平台的网页浏览器中进行训练和测试。我们讨论了如何使用 PDNET 来预测接触、距离间隔和实值距离。