Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Finland.
BMC Bioinformatics. 2013;14 Suppl 5(Suppl 5):S15. doi: 10.1186/1471-2105-14-S5-S15. Epub 2013 Apr 10.
Through transcription and alternative splicing, a gene can be transcribed into different RNA sequences (isoforms), depending on the individual, on the tissue the cell is in, or in response to some stimuli. Recent RNA-Seq technology allows for new high-throughput ways for isoform identification and quantification based on short reads, and various methods have been put forward for this non-trivial problem.
In this paper we propose a novel radically different method based on minimum-cost network flows. This has a two-fold advantage: on the one hand, it translates the problem as an established one in the field of network flows, which can be solved in polynomial time, with different existing solvers; on the other hand, it is general enough to encompass many of the previous proposals under the least sum of squares model. Our method works as follows: in order to find the transcripts which best explain, under a given fitness model, a splicing graph resulting from an RNA-Seq experiment, we find a min-cost flow in an offset flow network, under an equivalent cost model. Under very weak assumptions on the fitness model, the optimal flow can be computed in polynomial time. Parsimoniously splitting the flow back into few path transcripts can be done with any of the heuristics and approximations available from the theory of network flows. In the present implementation, we choose the simple strategy of repeatedly removing the heaviest path.
We proposed a new very general method based on network flows for a multiassembly problem arising from isoform identification and quantification with RNA-Seq. Experimental results on prediction accuracy show that our method is very competitive with popular tools such as Cufflinks and IsoLasso. Our tool, called Traph (Transcrips in gRAPHs), is available at: http://www.cs.helsinki.fi/gsa/traph/.
通过转录和选择性剪接,一个基因可以根据个体、细胞所在的组织或对某些刺激的反应,转录为不同的 RNA 序列(异构体)。最近的 RNA-Seq 技术允许基于短读长进行新的高通量异构体鉴定和定量方法,为此已经提出了各种方法。
在本文中,我们提出了一种基于最小成本网络流的全新方法。这种方法有两个优势:一方面,它将问题转化为网络流领域的一个已建立问题,可以使用不同的现有求解器在多项式时间内求解;另一方面,它足够通用,可以包含许多以前基于最小二乘法模型的建议。我们的方法如下:为了找到在给定适合度模型下最好地解释 RNA-Seq 实验产生的剪接图的转录本,我们在等效成本模型下的偏移流网络中找到最小成本流。在非常弱的适合度模型假设下,最优流可以在多项式时间内计算。通过任何来自网络流理论的启发式和近似方法,可以将流巧妙地拆分为少数路径转录本。在当前的实现中,我们选择了简单的策略,即反复删除最重的路径。
我们提出了一种新的基于网络流的通用方法,用于解决 RNA-Seq 中异构体鉴定和定量所产生的多组装问题。在预测准确性的实验结果表明,我们的方法与 Cufflinks 和 IsoLasso 等流行工具非常有竞争力。我们的工具名为 Traph(Graphs 中的转录本),可在 http://www.cs.helsinki.fi/gsa/traph/ 获得。