Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, California 90089, USA.
Department of Electrical Engineering, Stanford University, Stanford, California 94305, USA.
Genome Res. 2022 May;32(5):968-985. doi: 10.1101/gr.275979.121. Epub 2022 Mar 24.
The recent development and application of methods based on the general principle of "crosslinking and proximity ligation" (crosslink-ligation) are revolutionizing RNA structure studies in living cells. However, extracting structure information from such data presents unique challenges. Here, we introduce a set of computational tools for the systematic analysis of data from a wide variety of crosslink-ligation methods, specifically focusing on read mapping, alignment classification, and clustering. We design a new strategy to map short reads with irregular gaps at high sensitivity and specificity. Analysis of previously published data reveals distinct properties and bias caused by the crosslinking reactions. We perform rigorous and exhaustive classification of alignments and discover eight types of arrangements that provide distinct information on RNA structures and interactions. To deconvolve the dense and intertwined gapped alignments, we develop a network/graph-based tool Crosslinked RNA Secondary Structure Analysis using Network Techniques (CRSSANT), which enables clustering of gapped alignments and discovery of new alternative and dynamic conformations. We discover that multiple crosslinking and ligation events can occur on the same RNA, generating multisegment alignments to report complex high-level RNA structures and multi-RNA interactions. We find that alignments with overlapped segments are produced from potential homodimers and develop a new method for their de novo identification. Analysis of overlapping alignments revealed potential new homodimers in cellular noncoding RNAs and RNA virus genomes in the family. Together, this suite of computational tools enables rapid and efficient analysis of RNA structure and interaction data in living cells.
基于“交联和邻近连接”(交联连接)通用原理的方法的最新发展和应用正在彻底改变活细胞中 RNA 结构的研究。然而,从这些数据中提取结构信息提出了独特的挑战。在这里,我们介绍了一套用于系统分析各种交联连接方法数据的计算工具,特别是重点关注读映射、对齐分类和聚类。我们设计了一种新策略来以高灵敏度和特异性映射具有不规则间隙的短读。对先前发表的数据的分析揭示了交联反应引起的独特性质和偏差。我们对对齐进行了严格和详尽的分类,并发现了八种类型的排列,这些排列为 RNA 结构和相互作用提供了独特的信息。为了解卷积密集和交织的有间隙对齐,我们开发了一种基于网络/图的工具 Crosslinked RNA Secondary Structure Analysis using Network Techniques (CRSSANT),它可以对有间隙对齐进行聚类,并发现新的替代和动态构象。我们发现同一个 RNA 上可以发生多个交联和连接事件,产生多片段对齐来报告复杂的高级 RNA 结构和多 RNA 相互作用。我们发现具有重叠片段的对齐是由潜在的同源二聚体产生的,并开发了一种新方法来对其进行从头鉴定。重叠对齐的分析揭示了家族中非编码 RNA 和 RNA 病毒基因组中潜在的新同源二聚体。总之,这套计算工具套件能够快速有效地分析活细胞中的 RNA 结构和相互作用数据。