Information and Computational Sciences, James Hutton Institute, Dundee, UK.
Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Dundee, UK.
Life Sci Alliance. 2022 Apr 22;5(8). doi: 10.26508/lsa.202101255. Print 2022 Aug.
It is increasingly apparent that although different genotypes within a species share "core" genes, they also contain variable numbers of "specific" genes and different structures of "core" genes that are only present in a subset of individuals. Using a common reference genome may thus lead to a loss of genotype-specific information in the assembled Reference Transcript Dataset (RTD) and the generation of erroneous, incomplete or misleading transcriptomics analysis results. In this study, we assembled genotype-specific RTD (sRTD) and common reference-based RTD (cRTD) from RNA-seq data of cultivated Barke and Morex barley, respectively. Our quantitative evaluation showed that the sRTD has a significantly higher diversity of transcripts and alternative splicing events, whereas the cRTD missed 40% of transcripts present in the sRTD and it only has ∼70% accurate transcript assemblies. We found that the sRTD is more accurate for transcript quantification as well as differential expression analysis. However, gene-level quantification is less affected, which may be a reasonable compromise when a high-quality genotype-specific reference is not available.
越来越明显的是,尽管一个物种内的不同基因型共享“核心”基因,但它们也包含数量不定的“特定”基因和“核心”基因的不同结构,这些基因仅存在于一部分个体中。因此,使用通用参考基因组可能会导致组装的参考转录数据集(RTD)中丢失基因型特异性信息,并生成错误、不完整或误导性的转录组学分析结果。在这项研究中,我们分别从栽培的 Barke 和 Morex 大麦的 RNA-seq 数据中组装了基因型特异性 RTD(sRTD)和通用参考 RTD(cRTD)。我们的定量评估表明,sRTD 的转录本和可变剪接事件的多样性显著更高,而 cRTD 则错过了 sRTD 中存在的 40%的转录本,并且其转录本组装的准确性约为 70%。我们发现,sRTD 更适合于转录本定量和差异表达分析。然而,基因水平的定量受影响较小,这在没有高质量的基因型特异性参考的情况下可能是一个合理的折衷。