Keil Netanya, Monzó Carolina, McIntyre Lauren, Conesa Ana
Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL, USA, 32610.
University of Florida Genetics Institute, University of Florida, Gainesville, FL, USA, 32610.
bioRxiv. 2024 Sep 17:2024.08.23.609463. doi: 10.1101/2024.08.23.609463.
SQANTI-reads leverages SQANTI3, a tool for the analysis of the quality of transcript models, to develop a read-level quality control framework for replicated long-read RNA-seq experiments. The number and distribution of reads, as well as the number and distribution of unique junction chains (transcript splicing patterns), in SQANTI3 structural categories are informative of raw data quality. Multi-sample visualizations of QC metrics are presented by experimental design factors to identify outliers. We introduce new metrics for 1) the identification of potentially under-annotated genes and putative novel transcripts and for 2) quantifying variation in junction donors and acceptors. We applied SQANTI-reads to two different datasets, a developmental experiment and a multi-platform dataset from the LRGASP project and demonstrate that the tool effectively reveals the impact of read coverage on data quality, and readily identifies strong and weak splicing sites. SQANTI-reads is open source and available for download at GitHub.
SQANTI-reads利用用于分析转录本模型质量的工具SQANTI3,为重复的长读长RNA测序实验开发了一个读段水平的质量控制框架。SQANTI3结构类别中的读段数量和分布,以及独特连接链(转录本剪接模式)的数量和分布,能够反映原始数据的质量。通过实验设计因素展示质量控制指标的多样本可视化结果,以识别异常值。我们引入了新的指标,用于1)识别潜在注释不足的基因和假定的新转录本,以及2)量化连接供体和受体的变异。我们将SQANTI-reads应用于两个不同的数据集,一个发育实验数据集和来自LRGASP项目的多平台数据集,并证明该工具能够有效揭示读段覆盖度对数据质量的影响,并能轻松识别强剪接位点和弱剪接位点。SQANTI-reads是开源的,可在GitHub上下载。