Suppr超能文献

机器学习用于RNA结构预测的模型。

Machine learning a model for RNA structure prediction.

作者信息

Calonaci Nicola, Jones Alisha, Cuturello Francesca, Sattler Michael, Bussi Giovanni

机构信息

International School for Advanced Studies, via Bonomea 265, 34136 Trieste, Italy.

Institute of Structural Biology, Helmholtz Zentrum München, Neuherberg 85764, Germany.

出版信息

NAR Genom Bioinform. 2020 Nov 16;2(4):lqaa090. doi: 10.1093/nargab/lqaa090. eCollection 2020 Dec.

Abstract

RNA function crucially depends on its structure. Thermodynamic models currently used for secondary structure prediction rely on computing the partition function of folding ensembles, and can thus estimate minimum free-energy structures and ensemble populations. These models sometimes fail in identifying native structures unless complemented by auxiliary experimental data. Here, we build a set of models that combine thermodynamic parameters, chemical probing data (DMS and SHAPE) and co-evolutionary data (direct coupling analysis) through a network that outputs perturbations to the ensemble free energy. Perturbations are trained to increase the ensemble populations of a representative set of known native RNA structures. In the chemical probing nodes of the network, a convolutional window combines neighboring reactivities, enlightening their structural information content and the contribution of local conformational ensembles. Regularization is used to limit overfitting and improve transferability. The most transferable model is selected through a cross-validation strategy that estimates the performance of models on systems on which they are not trained. With the selected model we obtain increased ensemble populations for native structures and more accurate predictions in an independent validation set. The flexibility of the approach allows the model to be easily retrained and adapted to incorporate arbitrary experimental information.

摘要

RNA的功能关键取决于其结构。目前用于二级结构预测的热力学模型依赖于计算折叠集合的配分函数,因此可以估计最小自由能结构和集合群体。除非辅以辅助实验数据,这些模型有时无法识别天然结构。在这里,我们构建了一组模型,该模型通过一个输出对集合自由能扰动的网络,将热力学参数、化学探针数据(DMS和SHAPE)和共进化数据(直接耦合分析)结合起来。对扰动进行训练,以增加一组代表性的已知天然RNA结构的集合群体。在网络的化学探针节点中,一个卷积窗口将相邻的反应性结合起来,揭示它们的结构信息含量和局部构象集合的贡献。使用正则化来限制过拟合并提高可转移性。通过交叉验证策略选择最具可转移性的模型,该策略估计模型在未训练的系统上的性能。使用选定的模型,我们在独立验证集中获得了天然结构的增加的集合群体和更准确的预测。该方法的灵活性允许模型轻松地重新训练并适应纳入任意实验信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f117/7671377/9cb956da88a7/lqaa090fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验