Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam, Germany.
Int J Mol Sci. 2020 Mar 3;21(5):1720. doi: 10.3390/ijms21051720.
Quantification of gene expression is crucial to connect genome sequences with phenotypic and physiological data. RNA-Sequencing (RNA-Seq) has taken a prominent role in the study of transcriptomic reactions of plants to various environmental and genetic perturbations. However, comparative tests of different tools for RNA-Seq read mapping and quantification have been mainly performed on data from animals or humans, which necessarily neglect, for example, the large genetic variability among natural accessions within plant species. Here, we compared seven computational tools for their ability to map and quantify Illumina single-end reads from the accessions Columbia-0 (Col-0) and N14. Between 92.4% and 99.5% of all reads were mapped to the reference genome or transcriptome and the raw count distributions obtained from the different mappers were highly correlated. Using the software DESeq2 to determine differential gene expression (DGE) between plants exposed to 20 °C or 4 °C from these read counts showed a large pairwise overlap between the mappers. Interestingly, when the commercial CLC software was used with its own DGE module instead of DESeq2, strongly diverging results were obtained. All tested mappers provided highly similar results for mapping Illumina reads of two polymorphic Arabidopsis accessions to the reference genome or transcriptome and for the determination of DGE when the same software was used for processing.
基因表达的定量分析对于将基因组序列与表型和生理数据联系起来至关重要。RNA 测序(RNA-Seq)在研究植物对各种环境和遗传干扰的转录组反应方面发挥了重要作用。然而,对不同的 RNA-Seq 读段映射和定量工具的比较测试主要是在来自动物或人类的数据上进行的,这必然忽略了例如,植物物种内天然品系之间的巨大遗传变异性。在这里,我们比较了七种计算工具在映射和定量 Illumina 单端读数方面的能力,这些读数来自哥伦比亚-0(Col-0)和 N14 两个品系。所有读数的 92.4%到 99.5%都被映射到参考基因组或转录组,并且不同映射器获得的原始计数分布高度相关。使用软件 DESeq2 从这些读数中确定暴露于 20°C 或 4°C 的植物之间的差异基因表达(DGE),显示出映射器之间的大量成对重叠。有趣的是,当使用 CLC 商业软件及其自己的 DGE 模块而不是 DESeq2 时,得到了强烈分歧的结果。当使用相同的软件进行处理时,所有测试的映射器对于将两个多态性拟南芥品系的 Illumina 读取映射到参考基因组或转录组以及确定 DGE 时,都提供了高度相似的结果。