Computing Science, Simon Fraser University, Burnaby BC, Canada.
Centre for Biological Signalling Studies, University of Freiburg, Freiburg im Breisgau, Germany.
Bioinformatics. 2018 Sep 15;34(18):3101-3110. doi: 10.1093/bioinformatics/bty208.
Long non-coding RNAs (lncRNAs) are defined as transcripts longer than 200 nt that do not get translated into proteins. Often these transcripts are processed (spliced, capped and polyadenylated) and some are known to have important biological functions. However, most lncRNAs have unknown or poorly understood functions. Nevertheless, because of their potential role in cancer, lncRNAs are receiving a lot of attention, and the need for computational tools to predict their possible mechanisms of action is more than ever. Fundamentally, most of the known lncRNA mechanisms involve RNA-RNA and/or RNA-protein interactions. Through accurate predictions of each kind of interaction and integration of these predictions, it is possible to elucidate potential mechanisms for a given lncRNA.
Here, we introduce MechRNA, a pipeline for corroborating RNA-RNA interaction prediction and protein binding prediction for identifying possible lncRNA mechanisms involving specific targets or on a transcriptome-wide scale. The first stage uses a version of IntaRNA2 with added functionality for efficient prediction of RNA-RNA interactions with very long input sequences, allowing for large-scale analysis of lncRNA interactions with little or no loss of optimality. The second stage integrates protein binding information pre-computed by GraphProt, for both the lncRNA and the target. The final stage involves inferring the most likely mechanism for each lncRNA/target pair. This is achieved by generating candidate mechanisms from the predicted interactions, the relative locations of these interactions and correlation data, followed by selection of the most likely mechanistic explanation using a combined P-value. We applied MechRNA on a number of recently identified cancer-related lncRNAs (PCAT1, PCAT29 and ARLnc1) and also on two well-studied lncRNAs (PCA3 and 7SL). This led to the identification of hundreds of high confidence potential targets for each lncRNA and corresponding mechanisms. These predictions include the known competitive mechanism of 7SL with HuR for binding on the tumor suppressor TP53, as well as mechanisms expanding what is known about PCAT1 and ARLn1 and their targets BRCA2 and AR, respectively. For PCAT1-BRCA2, the mechanism involves competitive binding with HuR, which we confirmed using HuR immunoprecipitation assays.
MechRNA is available for download at https://bitbucket.org/compbio/mechrna.
Supplementary data are available at Bioinformatics online.
长非编码 RNA(lncRNA)的定义是长度大于 200nt 的转录本,不能翻译成蛋白质。通常这些转录本经过加工(剪接、加帽和多聚腺苷酸化),其中一些具有重要的生物学功能。然而,大多数 lncRNA 的功能未知或理解甚少。尽管如此,由于它们在癌症中的潜在作用,lncRNA 受到了广泛关注,因此需要计算工具来预测它们可能的作用机制。从根本上说,大多数已知的 lncRNA 机制涉及 RNA-RNA 和/或 RNA-蛋白质相互作用。通过准确预测每种相互作用,并整合这些预测,就有可能阐明给定 lncRNA 的潜在机制。
在这里,我们介绍了 MechRNA,这是一个用于验证 RNA-RNA 相互作用预测和蛋白质结合预测的管道,用于识别涉及特定靶标或转录组范围的可能 lncRNA 机制。第一阶段使用经过改进的 IntaRNA2 版本,该版本具有添加的功能,可有效地预测具有非常长输入序列的 RNA-RNA 相互作用,从而可以在几乎没有最优性损失的情况下对 lncRNA 相互作用进行大规模分析。第二阶段整合了 GraphProt 预先计算的蛋白质结合信息,包括 lncRNA 和靶标。最后一个阶段涉及推断每个 lncRNA/靶标对最可能的机制。这是通过从预测的相互作用、这些相互作用的相对位置和相关数据中生成候选机制来实现的,然后使用组合 P 值选择最可能的机制解释。我们将 MechRNA 应用于一些最近鉴定的与癌症相关的 lncRNA(PCAT1、PCAT29 和 ARLnc1)以及两个研究充分的 lncRNA(PCA3 和 7SL)。这导致为每个 lncRNA 确定了数百个高置信度的潜在靶标及其相应的机制。这些预测包括 7SL 与 HuR 竞争结合肿瘤抑制因子 TP53 的已知竞争性机制,以及关于 PCAT1 和 ARLn1 及其靶标 BRCA2 和 AR 的扩展知识的机制。对于 PCAT1-BRCA2,该机制涉及与 HuR 的竞争性结合,我们使用 HuR 免疫沉淀测定法证实了这一点。
MechRNA 可在 https://bitbucket.org/compbio/mechrna 下载。
补充数据可在《生物信息学》在线获取。