McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
Nat Commun. 2019 Nov 1;10(1):5000. doi: 10.1038/s41467-019-12990-0.
Transcript assembly from RNA-seq reads is a critical step in gene expression and subsequent functional analyses. Here we present PsiCLASS, an accurate and efficient transcript assembler based on an approach that simultaneously analyzes multiple RNA-seq samples. PsiCLASS combines mixture statistical models for exonic feature selection across multiple samples with splice graph based dynamic programming algorithms and a weighted voting scheme for transcript selection. PsiCLASS achieves significantly better sensitivity-precision tradeoff, and renders precision up to 2-3 fold higher than the StringTie system and Scallop plus TACO, the two best current approaches. PsiCLASS is efficient and scalable, assembling 667 GEUVADIS samples in 9 h, and has robust accuracy with large numbers of samples.
从 RNA-seq reads 进行转录本组装是基因表达和后续功能分析的关键步骤。在这里,我们提出了 PsiCLASS,这是一种基于同时分析多个 RNA-seq 样本的方法的准确而高效的转录本组装器。PsiCLASS 将跨多个样本的外显子特征选择的混合统计模型与基于剪接图的动态规划算法以及用于转录本选择的加权投票方案相结合。PsiCLASS 实现了显著更好的灵敏度-精度权衡,并且精度比当前两种最佳方法 StringTie 系统和 Scallop plus TACO 高 2-3 倍。PsiCLASS 高效且可扩展,在 9 小时内组装了 667 个 GEUVADIS 样本,并且具有大量样本的稳健准确性。