Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland.
Life Sci Alliance. 2019 Jan 17;2(1). doi: 10.26508/lsa.201800175. Print 2019 Feb.
Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results is directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility score, which provides a way to evaluate the reliability of transcript-level abundance estimates and the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that although most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.
大多数 RNA-seq 数据分析方法都以某种类型的基因组特征的丰度估计矩阵作为输入,因此任何获得的结果的质量都直接取决于这些丰度的质量。在这里,我们提出了连接覆盖兼容性评分,它提供了一种评估转录水平丰度估计可靠性和转录本注释目录准确性的方法。它通过比较在基因组区域中每个注释剪接连接点观察到的读取数量与从估计的转录本丰度和相应注释转录本的基因组坐标推断出的预测连接点读取数量来工作。我们表明,尽管大多数基因在观察到的和预测的连接覆盖率之间表现出良好的一致性,但有一小部分基因没有。无论使用何种方法估计转录本丰度,都会发现一致性差的基因,并且在任何下游分析中都应谨慎处理相应的转录本丰度。