Biomass Research Platform Team, Biomass Engineering Program Cooperation Division, RIKEN Center for Sustainable Resource Science, Tsurumi-ku, Yokohama, Kanagawa, Japan ; Kihara Institute for Biological Research, Yokohama City University, Totsuka-ku, Yokohama, Kanagawa, Japan.
PLoS One. 2013 Oct 9;8(10):e75265. doi: 10.1371/journal.pone.0075265. eCollection 2013.
A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the -3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a "one-stop" information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops.
全长 cDNA 文库的全面收集对于正确的结构基因注释和基因的功能分析至关重要。我们从 21 种不同的拟南芥 Bd21 组织中构建了一个混合全长 cDNA 文库,从大约 40000 个克隆的两端获得了 78163 个高质量的表达序列标签(ESTs)(包括 16079 个重叠群)。我们根据全长 cDNA 序列更新了拟南芥基因的结构注释,与最新的公开可用注释进行了比较。大约 10000 个非冗余的基因模型得到了全长 cDNA 的支持;大约 6000 个显示了一些转录单元的修饰。我们还发现了大约 580 个新的基因模型,包括 362 个在 Bd21 中新发现的。使用更新的转录起始位点,我们在 -3kb 启动子区域总共搜索了 580 个植物顺式基序,并确定了一个全基因组拟南芥启动子结构。此外,我们将拟南芥全长 cDNA 和更新的基因结构与小麦和大麦中可用的序列资源整合到一个可在网络上访问的数据库中,即 RIKEN 拟南芥全长 cDNA 数据库。该数据库代表了 Pooideae 中所有基因组信息的“一站式”信息资源,有助于对该模式草植物中的基因进行功能分析,并将知识无缝转移到三裂叶属作物中。