School of Computer Science & McGill Centre for Bioinformatics, McGill University, Montréal, H3A 0C6 QC, Canada.
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W480-5. doi: 10.1093/nar/gkt461. Epub 2013 Jun 8.
More than a simple carrier of the genetic information, messenger RNA (mRNA) coding regions can also harbor functional elements that evolved to control different post-transcriptional processes, such as mRNA splicing, localization and translation. Functional elements in RNA molecules are often encoded by secondary structure elements. In this aticle, we introduce Structural Profile Assignment of RNA Coding Sequences (SPARCS), an efficient method to analyze the (secondary) structure profile of protein-coding regions in mRNAs. First, we develop a novel algorithm that enables us to sample uniformly the sequence landscape preserving the dinucleotide frequency and the encoded amino acid sequence of the input mRNA. Then, we use this algorithm to generate a set of artificial sequences that is used to estimate the Z-score of classical structural metrics such as the sum of base pairing probabilities and the base pairing entropy. Finally, we use these metrics to predict structured and unstructured regions in the input mRNA sequence. We applied our methods to study the structural profile of the ASH1 genes and recovered key structural elements. A web server implementing this discovery pipeline is available at http://csb.cs.mcgill.ca/sparcs together with the source code of the sampling algorithm.
信使 RNA(mRNA)编码区不仅是遗传信息的简单载体,还可以包含功能元件,这些元件进化后可控制不同的转录后过程,如 mRNA 剪接、定位和翻译。RNA 分子中的功能元件通常由二级结构元件编码。在本文中,我们介绍了 RNA 编码序列的结构特征分配(SPARCS),这是一种分析 mRNA 中编码蛋白区域(二级)结构特征的有效方法。首先,我们开发了一种新算法,使我们能够在保留输入 mRNA 的二核苷酸频率和编码氨基酸序列的情况下均匀地对序列景观进行采样。然后,我们使用该算法生成一组人工序列,用于估计经典结构度量(如碱基配对概率之和和碱基配对熵)的 Z 值。最后,我们使用这些度量来预测输入 mRNA 序列中的结构域和非结构域。我们应用我们的方法来研究 ASH1 基因的结构特征,并恢复了关键的结构元件。一个实现该发现流程的 Web 服务器可在 http://csb.cs.mcgill.ca/sparcs 上获取,同时还提供采样算法的源代码。