The Date Palm Genome Project, King Abdulaziz City for Science and Technology, Riyadh, Kingdom of Saudi Arabia.
PLoS One. 2010 Sep 15;5(9):e12762. doi: 10.1371/journal.pone.0012762.
Date palm (Phoenix dactylifera L.), a member of Arecaceae family, is one of the three major economically important woody palms--the two other palms being oil palm and coconut tree--and its fruit is a staple food among Middle East and North African nations, as well as many other tropical and subtropical regions. Here we report a complete sequence of the data palm chloroplast (cp) genome based on pyrosequencing.
METHODOLOGY/PRINCIPAL FINDINGS: After extracting 369,022 cp sequencing reads from our whole-genome-shotgun data, we put together an assembly and validated it with intensive PCR-based verification, coupled with PCR product sequencing. The date palm cp genome is 158,462 bp in length and has a typical quadripartite structure of the large (LSC, 86,198 bp) and small single-copy (SSC, 17,712 bp) regions separated by a pair of inverted repeats (IRs, 27,276 bp). Similar to what has been found among most angiosperms, the date palm cp genome harbors 112 unique genes and 19 duplicated fragments in the IR regions. The junctions between LSC/IRs and SSC/IRs show different features of sequence expansion in evolution. We identified 78 SNPs as major intravarietal polymorphisms within the population of a specific cp genome, most of which were located in genes with vital functions. Based on RNA-sequencing data, we also found 18 polycistronic transcription units and three highly expression-biased genes--atpF, trnA-UGC, and rrn23.
Unlike most monocots, date palm has a typical cp genome similar to that of tobacco--with little rearrangement and gene loss or gain. High-throughput sequencing technology facilitates the identification of intravarietal variations in cp genomes among different cultivars. Moreover, transcriptomic analysis of cp genes provides clues for uncovering regulatory mechanisms of transcription and translation in chloroplasts.
海枣(Phoenix dactylifera L.),是棕榈科植物的一员,是三种主要经济上重要的木本棕榈之一——另外两种棕榈是油棕和椰子树——其果实是中东和北非国家以及许多其他热带和亚热带地区的主食。在这里,我们根据焦磷酸测序报告了一个完整的海枣叶绿体(cp)基因组序列。
方法/主要发现:从我们的全基因组 shotgun 数据中提取了 369,022 个 cp 测序reads 后,我们通过密集的基于 PCR 的验证,加上 PCR 产物测序,将其组装在一起并进行了验证。海枣 cp 基因组长 158,462 bp,具有典型的四分体结构,由大(LSC,86,198 bp)和小单拷贝(SSC,17,712 bp)区域组成,由一对反向重复(IRs,27,276 bp)隔开。与大多数被子植物中发现的情况类似,海枣 cp 基因组包含 112 个独特基因和 19 个在 IR 区域中重复的片段。LSC/IRs 和 SSC/IRs 之间的连接处显示出不同的序列扩展特征。我们在特定 cp 基因组的种群中发现了 78 个 SNP 作为主要的种内多态性,其中大多数位于具有重要功能的基因中。基于 RNA-seq 数据,我们还发现了 18 个多顺反子转录单元和三个高表达偏向基因——atpF、trnA-UGC 和 rrn23。
与大多数单子叶植物不同,海枣具有典型的 cp 基因组,类似于烟草——几乎没有重排和基因丢失或获得。高通量测序技术有助于鉴定不同品种 cp 基因组中的种内变异。此外,cp 基因的转录组分析为揭示叶绿体转录和翻译的调控机制提供了线索。