IEEE/ACM Trans Comput Biol Bioinform. 2022 Jan-Feb;19(1):48-56. doi: 10.1109/TCBB.2021.3083277. Epub 2022 Feb 3.
Recent advances in RNA-seq technology have made identification of expressed genes affordable, and thus boosting repaid development of transcriptomic studies. Transcriptome assembly, reconstructing all expressed transcripts from RNA-seq reads, is an essential step to understand genes, proteins, and cell functions. Transcriptome assembly remains a challenging problem due to complications in splicing variants, expression levels, uneven coverage and sequencing errors. Here, we formulate the transcriptome assembly problem as path extraction on splicing graphs (or assembly graphs), and propose a novel algorithm MultiTrans for path extraction using mixed integer linear programming. MultiTrans is able to take into consideration coverage constraints on vertices and edges, the number of paths and the paired-end information simultaneously. We benchmarked MultiTrans against two state-of-the-art transcriptome assemblers, TransLiG and rnaSPAdes. Experimental results show that MultiTrans generates more accurate transcripts compared to TransLiG (using the same splicing graphs) and rnaSPAdes (using the same assembly graphs). MultiTrans is freely available at https://github.com/jzbio/MultiTrans.
RNA-seq 技术的最新进展使得表达基因的鉴定变得经济实惠,从而加速了转录组研究的发展。转录组组装是从 RNA-seq 读取中重建所有表达转录本的重要步骤,是理解基因、蛋白质和细胞功能的关键步骤。由于剪接变体、表达水平、不均匀覆盖和测序错误的复杂性,转录组组装仍然是一个具有挑战性的问题。在这里,我们将转录组组装问题表述为拼接图(或组装图)上的路径提取,并提出了一种使用混合整数线性规划进行路径提取的新算法 MultiTrans。MultiTrans 能够同时考虑顶点和边的覆盖约束、路径数量和配对末端信息。我们将 MultiTrans 与两种最先进的转录组组装器 TransLiG 和 rnaSPAdes 进行了基准测试。实验结果表明,MultiTrans 生成的转录本比 TransLiG(使用相同的拼接图)和 rnaSPAdes(使用相同的组装图)更准确。MultiTrans 可在 https://github.com/jzbio/MultiTrans 上免费获得。