Ruhr-University Bochum, Faculty of Biology and Biotechnology, Microbial Biology, Bochum, Germany.
Ruhr-University Bochum, Faculty of Biology and Biotechnology, Bioinformatics Group, Bochum, Germany.
PLoS Comput Biol. 2022 Jul 7;18(7):e1010240. doi: 10.1371/journal.pcbi.1010240. eCollection 2022 Jul.
It is well-established that neural networks can predict or identify structural motifs of non-coding RNAs (ncRNAs). Yet, the neural network based identification of RNA structural motifs is limited by the availability of training data that are often insufficient for learning features of specific ncRNA families or structural motifs. Aiming to reliably identify intrinsic transcription terminators in bacteria, we introduce a novel pre-training approach that uses inverse folding to generate training data for predicting or identifying a specific family or structural motif of ncRNA. We assess the ability of neural networks to identify secondary structure by systematic in silico mutagenesis experiments. In a study to identify intrinsic transcription terminators as functionally well-understood RNA structural motifs, our inverse folding based pre-training approach significantly boosts the performance of neural network topologies, which outperform previous approaches to identify intrinsic transcription terminators. Inverse-folding based pre-training provides a simple, yet highly effective way to integrate the well-established thermodynamic energy model into deep neural networks for identifying ncRNA families or motifs. The pre-training technique is broadly applicable to a range of network topologies as well as different types of ncRNA families and motifs.
神经网络可以预测或识别非编码 RNA(ncRNA)的结构基序,这一点已得到充分证实。然而,基于神经网络的 RNA 结构基序识别受到训练数据可用性的限制,这些数据通常不足以学习特定 ncRNA 家族或结构基序的特征。为了可靠地识别细菌中的内在转录终止子,我们引入了一种新的预训练方法,该方法使用反向折叠来生成用于预测或识别 ncRNA 特定家族或结构基序的训练数据。我们通过系统的计算机诱变实验评估神经网络识别二级结构的能力。在一项旨在识别内在转录终止子作为功能明确的 RNA 结构基序的研究中,我们基于反向折叠的预训练方法显著提高了神经网络拓扑结构的性能,其性能优于先前用于识别内在转录终止子的方法。基于反向折叠的预训练为将成熟的热力学能量模型集成到用于识别 ncRNA 家族或基序的深度神经网络中提供了一种简单而有效的方法。该预训练技术广泛适用于多种网络拓扑结构以及不同类型的 ncRNA 家族和基序。