Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany.
Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany.
Gigascience. 2021 Aug 18;10(8). doi: 10.1093/gigascience/giab054.
Cross-linking and immunoprecipitation followed by next-generation sequencing (CLIP-seq) is the state-of-the-art technique used to experimentally determine transcriptome-wide binding sites of RNA-binding proteins (RBPs). However, it relies on gene expression, which can be highly variable between conditions and thus cannot provide a complete picture of the RBP binding landscape. This creates a demand for computational methods to predict missing binding sites. Although there exist various methods using traditional machine learning and lately also deep learning, we encountered several problems: many of these are not well documented or maintained, making them difficult to install and use, or are not even available. In addition, there can be efficiency issues, as well as little flexibility regarding options or supported features.
Here, we present RNAProt, an efficient and feature-rich computational RBP binding site prediction framework based on recurrent neural networks. We compare RNAProt with 1 traditional machine learning approach and 2 deep-learning methods, demonstrating its state-of-the-art predictive performance and better run time efficiency. We further show that its implemented visualizations capture known binding preferences and thus can help to understand what is learned. Since RNAProt supports various additional features (including user-defined features, which no other tool offers), we also present their influence on benchmark set performance. Finally, we show the benefits of incorporating additional features, specifically structure information, when learning the binding sites of an hairpin loop binding RBP.
RNAProt provides a complete framework for RBP binding site predictions, from data set generation over model training to the evaluation of binding preferences and prediction. It offers state-of-the-art predictive performance, as well as superior run time efficiency, while at the same time supporting more features and input types than any other tool available so far. RNAProt is easy to install and use, comes with comprehensive documentation, and is accompanied by informative statistics and visualizations. All this makes RNAProt a valuable tool to apply in future RBP binding site research.
交联和免疫沉淀后进行下一代测序(CLIP-seq)是一种用于实验确定 RNA 结合蛋白(RBPs)转录组范围结合位点的最先进技术。然而,它依赖于基因表达,这在条件之间可能会有很大的差异,因此不能提供 RBP 结合景观的完整图景。这就需要计算方法来预测缺失的结合位点。尽管存在各种使用传统机器学习和最近深度学习的方法,但我们遇到了几个问题:许多方法没有很好的文档记录或维护,使得它们难以安装和使用,或者甚至不可用。此外,可能存在效率问题,以及关于选项或支持功能的灵活性较小。
在这里,我们提出了 RNAProt,这是一种基于递归神经网络的高效且功能丰富的计算 RBP 结合位点预测框架。我们将 RNAProt 与 1 种传统机器学习方法和 2 种深度学习方法进行了比较,证明了它的最先进的预测性能和更好的运行时效率。我们进一步表明,它实现的可视化捕获了已知的结合偏好,因此可以帮助理解所学内容。由于 RNAProt 支持各种其他功能(包括用户定义的功能,这是其他工具所不具备的),我们还展示了它们对基准集性能的影响。最后,我们展示了在学习发夹环结合 RBP 的结合位点时纳入附加特征(特别是结构信息)的好处。
RNAProt 为 RBP 结合位点预测提供了一个完整的框架,从数据集生成到模型训练,再到结合偏好和预测的评估。它提供了最先进的预测性能,以及卓越的运行时效率,同时支持比迄今为止任何其他可用工具更多的功能和输入类型。RNAProt 易于安装和使用,具有全面的文档,并且伴随着丰富的统计信息和可视化。所有这些使 RNAProt 成为未来 RBP 结合位点研究的有价值的工具。