Earlham Institute, Norwich, Norfolk, NR4 7UZ, UK.
Plant Genome and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany.
BMC Biol. 2024 Mar 7;22(1):56. doi: 10.1186/s12915-024-01853-w.
RNA-seq is a fundamental technique in genomics, yet reference bias, where transcripts derived from non-reference alleles are quantified less accurately, can undermine the accuracy of RNA-seq quantification and thus the conclusions made downstream. Reference bias in RNA-seq analysis has yet to be explored in complex polyploid genomes despite evidence that they are often a complex mosaic of wild relative introgressions, which introduce blocks of highly divergent genes.
Here we use hexaploid wheat as a model complex polyploid, using both simulated and experimental data to show that RNA-seq alignment in wheat suffers from widespread reference bias which is largely driven by divergent introgressed genes. This leads to underestimation of gene expression and incorrect assessment of homoeologue expression balance. By incorporating gene models from ten wheat genome assemblies into a pantranscriptome reference, we present a novel method to reduce reference bias, which can be readily scaled to capture more variation as new genome and transcriptome data becomes available.
This study shows that the presence of introgressions can lead to reference bias in wheat RNA-seq analysis. Caution should be exercised by researchers using non-sample reference genomes for RNA-seq alignment and novel methods, such as the one presented here, should be considered.
RNA-seq 是基因组学的一项基本技术,但参考偏差(即来自非参考等位基因的转录本的定量准确性较低)会降低 RNA-seq 定量的准确性,并进而影响下游的结论。尽管有证据表明,复杂的多倍体基因组通常是野生近缘种渗入的复杂镶嵌体,引入了高度分化的基因块,但在复杂的多倍体基因组中,RNA-seq 分析中的参考偏差尚未得到探索。
我们使用六倍体小麦作为复杂多倍体的模型,使用模拟和实验数据表明,小麦中的 RNA-seq 比对存在广泛的参考偏差,这主要是由分化的渗入基因驱动的。这导致基因表达的低估和同系物表达平衡的错误评估。通过将来自十个小麦基因组组装的基因模型整合到泛转录组参考中,我们提出了一种减少参考偏差的新方法,该方法可以随着新的基因组和转录组数据的出现,很容易地扩展以捕获更多的变异。
本研究表明,渗入的存在会导致小麦 RNA-seq 分析中的参考偏差。使用非样本参考基因组进行 RNA-seq 比对的研究人员应谨慎行事,应考虑使用本文提出的新方法。