Pertea Mihaela, Pertea Geo M, Antonescu Corina M, Chang Tsung-Cheng, Mendell Joshua T, Salzberg Steven L
1] Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, USA. [2] McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland, USA.
1] Department of Molecular Biology, The University of Texas Southwestern Medical Center, Dallas, Texas, USA. [2] Center for Regenerative Science and Medicine, The University of Texas Southwestern Medical Center, Dallas, Texas, USA.
Nat Biotechnol. 2015 Mar;33(3):290-5. doi: 10.1038/nbt.3122. Epub 2015 Feb 18.
Methods used to sequence the transcriptome often produce more than 200 million short sequences. We introduce StringTie, a computational method that applies a network flow algorithm originally developed in optimization theory, together with optional de novo assembly, to assemble these complex data sets into transcripts. When used to analyze both simulated and real data sets, StringTie produces more complete and accurate reconstructions of genes and better estimates of expression levels, compared with other leading transcript assembly programs including Cufflinks, IsoLasso, Scripture and Traph. For example, on 90 million reads from human blood, StringTie correctly assembled 10,990 transcripts, whereas the next best assembly was of 7,187 transcripts by Cufflinks, which is a 53% increase in transcripts assembled. On a simulated data set, StringTie correctly assembled 7,559 transcripts, which is 20% more than the 6,310 assembled by Cufflinks. As well as producing a more complete transcriptome assembly, StringTie runs faster on all data sets tested to date compared with other assembly software, including Cufflinks.
用于转录组测序的方法通常会产生超过2亿条短序列。我们引入了StringTie,这是一种计算方法,它应用了最初在优化理论中开发的网络流算法,并结合可选的从头组装,将这些复杂的数据集组装成转录本。与其他领先的转录本组装程序(包括Cufflinks、IsoLasso、Scripture和Traph)相比,当用于分析模拟和真实数据集时,StringTie能产生更完整、准确的基因重建以及更好的表达水平估计。例如,对于来自人类血液的9000万条读数,StringTie正确组装了10990个转录本,而其次最佳组装是Cufflinks的7187个转录本,组装的转录本数量增加了53%。在一个模拟数据集上,StringTie正确组装了7559个转录本,比Cufflinks组装的6310个多20%。除了产生更完整的转录组组装外,与其他组装软件(包括Cufflinks)相比,StringTie在迄今为止测试的所有数据集上运行速度更快。