National Laboratory of Solid State Microstructures, School of Physics, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing, China.
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China.
PLoS Comput Biol. 2018 Nov 27;14(11):e1006514. doi: 10.1371/journal.pcbi.1006514. eCollection 2018 Nov.
Quality assessment is essential for the computational prediction and design of RNA tertiary structures. To date, several knowledge-based statistical potentials have been proposed and proved to be effective in identifying native and near-native RNA structures. All these potentials are based on the inverse Boltzmann formula, while differing in the choice of the geometrical descriptor, reference state, and training dataset. Via an approach that diverges completely from the conventional statistical potentials, our work explored the power of a 3D convolutional neural network (CNN)-based approach as a quality evaluator for RNA 3D structures, which used a 3D grid representation of the structure as input without extracting features manually. The RNA structures were evaluated by examining each nucleotide, so our method can also provide local quality assessment. Two sets of training samples were built. The first one included 1 million samples generated by high-temperature molecular dynamics (MD) simulations and the second one included 1 million samples generated by Monte Carlo (MC) structure prediction. Both MD and MC procedures were performed for a non-redundant set of 414 RNAs. For two training datasets (one including only MD training samples and the other including both MD and MC training samples), we trained two neural networks, named RNA3DCNN_MD and RNA3DCNN_MDMC, respectively. The former is suitable for assessing near-native structures, while the latter is suitable for assessing structures covering large structural space. We tested the performance of our method and made comparisons with four other traditional scoring functions. On two of three test datasets, our method performed similarly to the state-of-the-art traditional scoring function, and on the third test dataset, our method was far superior to other scoring functions. Our method can be downloaded from https://github.com/lijunRNA/RNA3DCNN.
质量评估对于 RNA 三级结构的计算预测和设计至关重要。迄今为止,已经提出了几种基于知识的统计势能,并已证明它们在识别天然和近天然 RNA 结构方面非常有效。所有这些势能都是基于逆玻尔兹曼公式,但在几何描述符、参考状态和训练数据集的选择上有所不同。通过一种完全不同于传统统计势能的方法,我们的工作探索了基于 3D 卷积神经网络 (CNN) 的方法作为 RNA 3D 结构质量评估器的能力,该方法使用结构的 3D 网格表示作为输入,而无需手动提取特征。通过检查每个核苷酸来评估 RNA 结构,因此我们的方法还可以提供局部质量评估。构建了两组训练样本。第一个样本集包括 100 万个由高温分子动力学 (MD) 模拟生成的样本,第二个样本集包括 100 万个由蒙特卡罗 (MC) 结构预测生成的样本。MD 和 MC 程序都是针对 414 个非冗余 RNA 进行的。对于两个训练数据集(一个仅包括 MD 训练样本,另一个包括 MD 和 MC 训练样本),我们分别训练了两个神经网络,分别命名为 RNA3DCNN_MD 和 RNA3DCNN_MDMC。前者适用于评估近天然结构,而后者适用于评估覆盖较大结构空间的结构。我们测试了我们方法的性能,并与其他四种传统评分函数进行了比较。在三个测试数据集的两个数据集中,我们的方法与最先进的传统评分函数表现相似,而在第三个测试数据集中,我们的方法远远优于其他评分函数。我们的方法可以从 https://github.com/lijunRNA/RNA3DCNN 下载。