Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
VIB Center for Plant Systems Biology, VIB, Ghent, Belgium.
Methods Mol Biol. 2023;2545:47-76. doi: 10.1007/978-1-0716-2561-3_3.
Polyploidizations, or whole-genome duplications (WGDs), in plants have increased biological complexity, facilitated evolutionary innovation, and likely enabled adaptation under harsh conditions. Besides genomic data, transcriptome data have been widely employed to detect WGDs, due to their efficient accessibility to the gene space of a species. Age distributions based on synonymous substitutions (so-called K age distributions) for paralogs assembled from transcriptome data have identified numerous WGDs in plants, paving the way for further studies on the importance of WGDs for the evolution of seed and flowering plants. However, it is still unclear how transcriptome-based age distributions compare to those based on genomic data. In this chapter, we implemented three different de novo transcriptome assembly pipelines with two popular assemblers, i.e., Trinity and SOAPdenovo-Trans. We selected six plant species with published genomes and transcriptomes to evaluate how assembled transcripts from different pipelines perform when using K distributions to detect previously documented WGDs in the six species. Further, using genes predicted in each genome as references, we evaluated the effects of missing genes, gene family clustering, and de novo assembled transcripts on the transcriptome-based K distributions. Our results show that, although the transcriptome-based K distributions differ from the genome-based ones with respect to their shapes and scales, they are still reasonably reliable for unveiling WGDs, except in species where most duplicates originated from a recent WGD. We also discuss how to overcome some possible pitfalls when using transcriptome data to identify WGDs.
多倍体化,或全基因组加倍 (WGD),在植物中增加了生物复杂性,促进了进化创新,并可能使物种适应恶劣环境。除了基因组数据外,转录组数据也被广泛用于检测 WGD,因为它们可以有效地获取物种的基因空间。基于同义替换的等位基因年龄分布(所谓的 K 年龄分布),从转录组数据组装的旁系同源物已经在植物中鉴定出许多 WGD,为进一步研究 WGD 对种子和开花植物进化的重要性铺平了道路。然而,目前尚不清楚基于转录组的年龄分布与基于基因组数据的年龄分布相比如何。在本章中,我们使用两种流行的组装器,即 Trinity 和 SOAPdenovo-Trans,实现了三种不同的从头转录组组装管道。我们选择了六个具有已发表基因组和转录组的植物物种,以评估使用 K 分布从不同管道组装的转录本在检测六个物种中先前记录的 WGD 时的表现。此外,使用每个基因组中预测的基因作为参考,我们评估了缺失基因、基因家族聚类和从头组装的转录本对基于转录组的 K 分布的影响。我们的结果表明,尽管基于转录组的 K 分布在形状和规模上与基于基因组的 K 分布不同,但它们仍然可以合理地可靠地揭示 WGD,除了在大多数副本起源于最近的 WGD 的物种中。我们还讨论了如何克服使用转录组数据识别 WGD 时可能遇到的一些陷阱。