College of Science, Beijing Information Science and Technology University, 100192, Beijing, China.
Beijing GeneX Health Co.,Ltd, 100195, Beijing, China.
Nat Commun. 2023 Sep 23;14(1):5935. doi: 10.1038/s41467-023-41649-0.
Single-molecule Real-time Isoform Sequencing (Iso-seq) of transcriptomes by PacBio can generate very long and accurate reads, thus providing an ideal platform for full-length transcriptome analysis. We present an integrated computational toolkit named TAGET for Iso-seq full-length transcript data analyses, including transcript alignment, annotation, gene fusion detection, and quantification analyses such as differential expression gene analysis and differential isoform usage analysis. We evaluate the performance of TAGET using a public Iso-seq dataset and newly sequenced Iso-seq datasets from tumor patients. TAGET gives significantly more precise novel splice site prediction and enables more accurate novel isoform and gene fusion discoveries, as validated by experimental validations and comparisons with RNA-seq data. We identify and experimentally validate a differential isoform usage gene ECM1, and further show that its isoform ECM1b may be a tumor-suppressor in laryngocarcinoma. Our results demonstrate that TAGET provides a valuable computational toolkit and can be applied to many full-length transcriptome studies.
基于 PacBio 的单分子实时同型序列测序(Iso-seq)可以生成非常长且准确的读段,因此为全长转录组分析提供了理想的平台。我们提出了一个名为 TAGET 的集成计算工具包,用于 Iso-seq 全长转录本数据的分析,包括转录本比对、注释、基因融合检测,以及差异表达基因分析和差异同型物使用分析等定量分析。我们使用公共 Iso-seq 数据集和新测序的肿瘤患者 Iso-seq 数据集来评估 TAGET 的性能。TAGET 提供了更精确的新剪接位点预测,并通过实验验证和与 RNA-seq 数据的比较,实现了更准确的新同型物和基因融合发现。我们鉴定并通过实验验证了一个差异同型物使用基因 ECM1,并进一步表明其同型物 ECM1b 可能是喉癌中的肿瘤抑制因子。我们的结果表明,TAGET 提供了一个有价值的计算工具包,可应用于许多全长转录组研究。