Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Brazil.
Plant J. 2020 Aug;103(5):1894-1909. doi: 10.1111/tpj.14850. Epub 2020 Aug 13.
Soybean (Glycine max [L.] Merr.) is a major crop in animal feed and human nutrition, mainly for its rich protein and oil contents. The remarkable rise in soybean transcriptome studies over the past 5 years generated an enormous amount of RNA-seq data, encompassing various tissues, developmental conditions and genotypes. In this study, we have collected data from 1298 publicly available soybean transcriptome samples, processed the raw sequencing reads and mapped them to the soybean reference genome in a systematic fashion. We found that 94% of the annotated genes (52 737/56 044) had detectable expression in at least one sample. Unsupervised clustering revealed three major groups, comprising samples from aerial, underground and seed/seed-related parts. We found 452 genes with uniform and constant expression levels, supporting their roles as housekeeping genes. On the other hand, 1349 genes showed heavily biased expression patterns towards particular tissues. A transcript-level analysis revealed that 95% (70 963 of 74 490) of the assembled transcripts have intron chains exactly matching those from known transcripts, whereas 3256 assembled transcripts represent potentially novel splicing isoforms. The dataset compiled here constitute a new resource for the community, which can be downloaded or accessed through a user-friendly web interface at http://venanciogroup.uenf.br/resources/. This comprehensive transcriptome atlas will likely accelerate research on soybean genetics and genomics.
大豆(Glycine max [L.] Merr.)是动物饲料和人类营养的主要作物,主要因其丰富的蛋白质和油脂含量而受到重视。在过去的 5 年中,大豆转录组研究显著增加,产生了大量的 RNA-seq 数据,涵盖了各种组织、发育条件和基因型。在这项研究中,我们收集了来自 1298 个公开可用的大豆转录组样本的数据,系统地处理原始测序读数并将其映射到大豆参考基因组上。我们发现,至少有一个样本中检测到 94%的注释基因(52737/56044)有表达。无监督聚类揭示了三个主要组,包括来自地上、地下和种子/种子相关部分的样本。我们发现了 452 个具有均匀和恒定表达水平的基因,支持它们作为管家基因的作用。另一方面,1349 个基因表现出强烈偏向于特定组织的表达模式。转录水平分析表明,组装的转录本中有 95%(70963/74490)与已知转录本的内含子链完全匹配,而 3256 个组装的转录本代表潜在的新剪接异构体。这里编译的数据集构成了一个新的社区资源,可以通过用户友好的网络界面在 http://venanciogroup.uenf.br/resources/ 上下载或访问。这个综合的转录组图谱很可能会加速大豆遗传学和基因组学的研究。