Department of Bioengineering, Stanford University, Stanford, United States.
Chan Zuckerberg Biohub, San Francisco, United States.
Elife. 2021 Apr 16;10:e59697. doi: 10.7554/eLife.59697.
Ribozyme switches are a class of RNA-encoded genetic switch that support conditional regulation of gene expression across diverse organisms. An improved elucidation of the relationships between sequence, structure, and activity can improve our capacity for de novo rational design of ribozyme switches. Here, we generated data on the activity of hundreds of thousands of ribozyme sequences. Using automated structural analysis and machine learning, we leveraged these large data sets to develop predictive models that estimate the in vivo gene-regulatory activity of a ribozyme sequence. These models supported the de novo design of ribozyme libraries with low mean basal gene-regulatory activities and new ribozyme switches that exhibit changes in gene-regulatory activity in the presence of a target ligand, producing functional switches for four out of five aptamers. Our work examines how biases in the model and the data set that affect prediction accuracy can arise and demonstrates that machine learning can be applied to RNA sequences to predict gene-regulatory activity, providing the basis for design tools for functional RNAs.
核酶开关是一类 RNA 编码的遗传开关,支持在不同的生物体中对基因表达进行条件调节。对序列、结构和活性之间关系的深入了解可以提高我们从头设计核酶开关的能力。在这里,我们生成了数十万核酶序列的活性数据。我们利用自动化结构分析和机器学习,利用这些大数据集来开发预测模型,以估计核酶序列在体内的基因调控活性。这些模型支持设计具有低平均基础基因调控活性的核酶文库,以及新的核酶开关,这些开关在存在靶配体时表现出基因调控活性的变化,为五个适体中的四个产生了功能性开关。我们的工作研究了模型和数据集的偏差如何影响预测准确性,并证明了机器学习可以应用于 RNA 序列来预测基因调控活性,为功能性 RNA 的设计工具提供了基础。