Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America.
Department of Education, Stanford University, Stanford, CA, United States of America.
PLoS Comput Biol. 2019 Jun 27;15(6):e1007059. doi: 10.1371/journal.pcbi.1007059. eCollection 2019 Jun.
Emerging RNA-based approaches to disease detection and gene therapy require RNA sequences that fold into specific base-pairing patterns, but computational algorithms generally remain inadequate for these secondary structure design tasks. The Eterna project has crowdsourced RNA design to human video game players in the form of puzzles that reach extraordinary difficulty. Here, we demonstrate that Eterna participants' moves and strategies can be leveraged to improve automated computational RNA design. We present an eternamoves-large repository consisting of 1.8 million of player moves on 12 of the most-played Eterna puzzles as well as an eternamoves-select repository of 30,477 moves from the top 72 players on a select set of more advanced puzzles. On eternamoves-select, we present a multilayer convolutional neural network (CNN) EternaBrain that achieves test accuracies of 51% and 34% in base prediction and location prediction, respectively, suggesting that top players' moves are partially stereotyped. Pipelining this CNN's move predictions with single-action-playout (SAP) of six strategies compiled by human players solves 61 out of 100 independent puzzles in the Eterna100 benchmark. EternaBrain-SAP outperforms previously published RNA design algorithms and achieves similar or better performance than a newer generation of deep learning methods, while being largely orthogonal to these other methods. Our study provides useful lessons for future efforts to achieve human-competitive performance with automated RNA design algorithms.
新兴的基于 RNA 的疾病检测和基因治疗方法需要能够折叠成特定碱基配对模式的 RNA 序列,但计算算法通常仍然不足以完成这些二级结构设计任务。Eterna 项目以拼图的形式将 RNA 设计众包给人类视频游戏玩家,这些拼图达到了极高的难度。在这里,我们证明了 Eterna 参与者的动作和策略可以被利用来改进自动化计算 RNA 设计。我们提出了一个 eternamoves-large 存储库,其中包含 120 个最常玩的 Eterna 拼图中 180 万玩家的动作,以及一个 eternamoves-select 存储库,其中包含在一组更高级的拼图中排名前 72 的玩家的 30477 个动作。在 eternamoves-select 上,我们提出了一个多层卷积神经网络 (CNN) EternaBrain,它在碱基预测和位置预测方面的测试准确率分别达到 51%和 34%,这表明顶级玩家的动作部分是刻板的。将这个 CNN 的动作预测与人类玩家编写的六个策略的单步执行 (SAP) 进行流水线处理,可以解决 Eterna100 基准测试中的 61 个独立拼图中的 100 个。EternaBrain-SAP 的性能优于之前发表的 RNA 设计算法,并达到了与新一代深度学习方法类似或更好的性能,同时与这些其他方法在很大程度上是正交的。我们的研究为未来使用自动化 RNA 设计算法实现与人类竞争的性能提供了有用的经验教训。