Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA; School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE 68588, USA; Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA.
Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588, USA; School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE 68588, USA.
Methods. 2020 Apr 1;176:14-24. doi: 10.1016/j.ymeth.2019.06.001. Epub 2019 Jun 6.
Whole genome duplications (WGD) occur widely in plants, but the effects of these events impact all branches of life. WGD events have major evolutionary impacts, often leading to major structural changes within the chromosomes and massive changes in gene expression that facilitate rapid speciation and gene diversification. Even for species that currently have diploid genomes, the impact of ancestral duplication events is still present in the genomes, especially in the context of highly similar gene families that are retained from WGD. However, the impact of these ploidies on various bioinformatics workflows has not been studied well. In this review, we overview biological significance of polyploidy in different organisms. We describe the impact of having polyploid transcriptomes on bioinformatics analyses, especially focusing on transcriptome assembly and transcript quantification. We discuss the benefits of using simulated benchmarking data when we examine the performance of various methods. We also present an example strategy to generate simulated allopolyploid transcriptomes and RNAseq datasets and how these benchmark datasets can be used to assess the performance of transcript assembly and quantification methods. Our benchmarking study shows that all transcriptome assembly methods are affected by having polyploid genomes. Quantification accuracy is also impacted by polyploidy depending on the method. These simulated datasets can be adapted for testing, such as, read mapping, variant calling, and differential expression using biologically realistic conditions.
全基因组加倍(Whole genome duplication,WGD)在植物中广泛发生,但这些事件的影响涉及生命的各个分支。WGD 事件具有重大的进化影响,通常导致染色体的重大结构变化和基因表达的大规模变化,从而促进快速的物种形成和基因多样化。即使对于目前具有二倍体基因组的物种,祖先重复事件的影响仍然存在于基因组中,特别是在高度相似的基因家族中,这些基因家族是从 WGD 中保留下来的。然而,这些多倍体对各种生物信息学工作流程的影响尚未得到很好的研究。在这篇综述中,我们概述了多倍体在不同生物体中的生物学意义。我们描述了具有多倍体转录组对生物信息学分析的影响,特别是重点关注转录组组装和转录定量。我们讨论了在评估各种方法的性能时使用模拟基准数据的好处。我们还提出了一种生成模拟异源多倍体转录组和 RNAseq 数据集的示例策略,以及如何使用这些基准数据集评估转录组组装和定量方法的性能。我们的基准研究表明,所有转录组组装方法都受到多倍体基因组的影响。定量准确性也取决于多倍体,具体取决于方法。这些模拟数据集可以适应测试,例如,在生物学上合理的条件下进行读映射、变体调用和差异表达分析。