IEEE/ACM Trans Comput Biol Bioinform. 2020 May-Jun;17(3):938-948. doi: 10.1109/TCBB.2018.2808350. Epub 2018 Feb 21.
High-throughput sequencing of mRNA has made the deep and efficient probing of transcriptome more affordable. However, the vast amounts of short RNA-seq reads make de novo transcriptome assembly an algorithmic challenge. In this work, we present IsoTree, a novel framework for transcripts reconstruction in the absence of reference genomes. Unlike most of de novo assembly methods that build de Bruijn graph or splicing graph by connecting k- mers which are sets of overlapping substrings generated from reads, IsoTree constructs splicing graph by connecting reads directly. For each splicing graph, IsoTree applies an iterative scheme of mixed integer linear program to build a prefix tree, called isoform tree. Each path from the root node of the isoform tree to a leaf node represents a plausible transcript candidate which will be pruned based on the information of paired-end reads. Experiments showed that in most cases IsoTree performs better than other leading transcriptome assembly programs. IsoTree is available at https://github.com/Jane110111107/IsoTree.
高通量测序的 mRNA 使深入和有效的研究转录组更加实惠。然而,大量的短 RNA-seq 读长使得从头转录组组装成为一个算法挑战。在这项工作中,我们提出了 IsoTree,这是一种在没有参考基因组的情况下进行转录本重建的新框架。与大多数通过连接 k-mers(从读长生成的重叠子字符串集)构建 de Bruijn 图或拼接图的从头组装方法不同,IsoTree 通过直接连接读长来构建拼接图。对于每个拼接图,IsoTree 应用混合整数线性规划的迭代方案来构建前缀树,称为异构体树。从异构体树的根节点到叶节点的每条路径都代表一个可能的转录本候选者,它将根据配对末端读长的信息进行修剪。实验表明,在大多数情况下,IsoTree 的性能优于其他领先的转录组组装程序。IsoTree 可在 https://github.com/Jane110111107/IsoTree 获得。