Zhu Qiyao, Schlick Tamar
Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, United States.
Department of Chemistry, New York University, New York, New York 10003, United States.
J Phys Chem B. 2021 Feb 4;125(4):1144-1155. doi: 10.1021/acs.jpcb.0c10685. Epub 2021 Jan 20.
Novel RNA motif design is of great practical importance for technology and medicine. Increasingly, computational design plays an important role in such efforts. Our coarse-grained RAG (RNA-As-Graphs) framework offers strategies for enumerating the universe of RNA 2D folds, selecting "" candidates for design, and determining sequences that fold onto these candidates. In RAG, RNA secondary structures are represented as tree or dual graphs. Graphs with known RNA structures are called "existing", and the others are labeled "hypothetical". By using simplified features for RNA graphs, we have clustered the hypothetical graphs into "RNA-like" and "non-RNA-like" groups and proposed RNA-like graphs as candidates for design. Here, we propose a new way of designing graph features by using Fiedler vectors. The new features reflect graph shapes better, and they lead to a more clustered organization of existing graphs. We show significant increases in K-means clustering accuracy by using the new features (e.g., up to 95% and 98% accuracy for tree and dual graphs, respectively). In addition, we propose a scoring model for top graph candidate selection. This scoring model allows users to set a threshold for candidates, and it incorporates weighing of existing graphs based on their corresponding number of known RNAs. We include a list of top scored RNA-like candidates, which we hope will stimulate future novel RNA design.
新型RNA基序设计对技术和医学具有重大的实际意义。计算设计在这些工作中发挥着越来越重要的作用。我们的粗粒度RAG(RNA-As-Graphs)框架提供了用于枚举RNA二维折叠的全集、选择设计候选对象以及确定折叠到这些候选对象上的序列的策略。在RAG中,RNA二级结构被表示为树状图或对偶图。具有已知RNA结构的图被称为“现有图”,其他的则被标记为“假设图”。通过使用RNA图的简化特征,我们将假设图聚类为“类RNA”和“非类RNA”组,并提出类RNA图作为设计候选对象。在这里,我们提出了一种使用菲德勒向量设计图特征的新方法。新特征能更好地反映图的形状,并且它们导致现有图的聚类组织更加合理。我们展示了使用新特征后K均值聚类准确率的显著提高(例如,树状图和对偶图的准确率分别高达95%和98%)。此外,我们提出了一种用于顶级图候选对象选择的评分模型。该评分模型允许用户为候选对象设置阈值,并且它基于现有图对应的已知RNA数量对其进行加权。我们列出了得分最高的类RNA候选对象列表,希望这将激发未来新型RNA的设计。