INRA, UMR AGAP, F-34060, Montpellier, France.
CIRAD, UMR AGAP, Avenue Agropolis, F-34398, Montpellier, France.
Mol Ecol Resour. 2017 May;17(3):565-580. doi: 10.1111/1755-0998.12587. Epub 2016 Aug 29.
We produced a unique large data set of reference transcriptomes to obtain new knowledge about the evolution of plant genomes and crop domestication. For this purpose, we validated a RNA-Seq data assembly protocol to perform comparative population genomics. For the validation, we assessed and compared the quality of de novo Illumina short-read assemblies using data from two crops for which an annotated reference genome was available, namely grapevine and sorghum. We used the same protocol for the release of 26 new transcriptomes of crop plants and wild relatives, including still understudied crops such as yam, pearl millet and fonio. The species list has a wide taxonomic representation with the inclusion of 15 monocots and 11 eudicots. All contigs were annotated using BLAST, prot4EST and Blast2GO. A strong originality of the data set is that each crop is associated with close relative species, which will permit whole-genome comparative evolutionary studies between crops and their wild-related species. This large resource will thus serve research communities working on both crops and model organisms. All the data are available at http://arcad-bioinformatics.southgreen.fr/.
我们生成了一个独特的大型参考转录组数据集,以获得有关植物基因组进化和作物驯化的新知识。为此,我们验证了一种 RNA-Seq 数据组装方案,以进行比较群体基因组学研究。为此目的,我们评估和比较了来自两种具有已注释参考基因组的作物(即葡萄和高粱)的 Illumina 短读测序数据组装质量。我们使用相同的方案发布了 26 种新的作物植物和野生近缘种的转录组,包括仍然研究较少的作物,如山药、珍珠粟和非洲小米。该物种列表具有广泛的分类学代表性,包括 15 种单子叶植物和 11 种双子叶植物。所有的 contigs 都使用 BLAST、prot4EST 和 Blast2GO 进行注释。该数据集的一个主要特点是每个作物都与亲缘关系密切的物种相关联,这将允许在作物与其野生相关物种之间进行全基因组比较进化研究。这个大型资源将为同时研究作物和模式生物的研究界提供服务。所有的数据都可以在 http://arcad-bioinformatics.southgreen.fr/ 上获得。