The University of Tokyo, Tokyo, Japan.
Preferred Networks, Inc., Tokyo, Japan.
PLoS One. 2024 Oct 1;19(10):e0310814. doi: 10.1371/journal.pone.0310814. eCollection 2024.
The design of RNA plays a crucial role in developing RNA vaccines, nucleic acid therapeutics, and innovative biotechnological tools. However, existing techniques frequently lack versatility across various tasks and are dependent on pre-defined secondary structure or other prior knowledge. To address these limitations, we introduce GenerRNA, a Transformer-based model inspired by the success of large language models (LLMs) in protein and molecule generation. GenerRNA is pre-trained on large-scale RNA sequences and capable of generating novel RNA sequences with stable secondary structures, while ensuring distinctiveness from existing sequences, thereby expanding our exploration of the RNA space. Moreover, GenerRNA can be fine-tuned on smaller, specialized datasets for specific subtasks, enabling the generation of RNAs with desired functionalities or properties without requiring any prior knowledge input. As a demonstration, we fine-tuned GenerRNA and successfully generated novel RNA sequences exhibiting high affinity for target proteins. Our work is the first application of a generative language model to RNA generation, presenting an innovative approach to RNA design.
RNA 的设计在开发 RNA 疫苗、核酸疗法和创新生物技术工具方面发挥着关键作用。然而,现有的技术在各种任务上往往缺乏通用性,并且依赖于预先定义的二级结构或其他先验知识。为了解决这些限制,我们引入了 GenerRNA,这是一种受大型语言模型 (LLM) 在蛋白质和分子生成方面成功启发的基于 Transformer 的模型。GenerRNA 经过大规模 RNA 序列的预训练,能够生成具有稳定二级结构的新型 RNA 序列,同时确保与现有序列的独特性,从而扩展了我们对 RNA 空间的探索。此外,GenerRNA 可以针对特定的子任务在较小的专业数据集上进行微调,从而能够生成具有所需功能或特性的 RNA,而无需任何先验知识输入。作为演示,我们对 GenerRNA 进行了微调,并成功生成了对目标蛋白具有高亲和力的新型 RNA 序列。我们的工作是生成式语言模型在 RNA 生成方面的首次应用,为 RNA 设计提供了一种创新方法。