Subtropical Horticulture Research Station, USDA-ARS, 13601 Old Culter Road, Miami, FL 33158, USA.
BMC Genomics. 2011 Aug 16;12:413. doi: 10.1186/1471-2164-12-413.
The fermented dried seeds of Theobroma cacao (cacao tree) are the main ingredient in chocolate. World cocoa production was estimated to be 3 million tons in 2010 with an annual estimated average growth rate of 2.2%. The cacao bean production industry is currently under threat from a rise in fungal diseases including black pod, frosty pod, and witches' broom. In order to address these issues, genome-sequencing efforts have been initiated recently to facilitate identification of genetic markers and genes that could be utilized to accelerate the release of robust T. cacao cultivars. However, problems inherent with assembly and resolution of distal regions of complex eukaryotic genomes, such as gaps, chimeric joins, and unresolvable repeat-induced compressions, have been unavoidably encountered with the sequencing strategies selected.
Here, we describe the construction of a BAC-based integrated genetic-physical map of the T. cacao cultivar Matina 1-6 which is designed to augment and enhance these sequencing efforts. Three BAC libraries, each comprised of 10× coverage, were constructed and fingerprinted. 230 genetic markers from a high-resolution genetic recombination map and 96 Arabidopsis-derived conserved ortholog set (COS) II markers were anchored using pooled overgo hybridization. A dense tile path consisting of 29,383 BACs was selected and end-sequenced. The physical map consists of 154 contigs and 4,268 singletons. Forty-nine contigs are genetically anchored and ordered to chromosomes for a total span of 307.2 Mbp. The unanchored contigs (105) span 67.4 Mbp and therefore the estimated genome size of T. cacao is 374.6 Mbp. A comparative analysis with A. thaliana, V. vinifera, and P. trichocarpa suggests that comparisons of the genome assemblies of these distantly related species could provide insights into genome structure, evolutionary history, conservation of functional sites, and improvements in physical map assembly. A comparison between the two T. cacao cultivars Matina 1-6 and Criollo indicates a high degree of collinearity in their genomes, yet rearrangements were also observed.
The results presented in this study are a stand-alone resource for functional exploitation and enhancement of Theobroma cacao but are also expected to complement and augment ongoing genome-sequencing efforts. This resource will serve as a template for refinement of the T. cacao genome through gap-filling, targeted re-sequencing, and resolution of repetitive DNA arrays.
可可树的发酵干燥种子是巧克力的主要成分。据估计,2010 年世界可可产量为 300 万吨,年增长率估计为 2.2%。可可豆生产行业目前正受到真菌疾病(包括黑荚病、霜霉病和女巫扫帚病)的威胁。为了解决这些问题,最近已经启动了基因组测序工作,以促进鉴定可用于加速健壮可可品种释放的遗传标记和基因。然而,所选测序策略不可避免地遇到了组装和解决复杂真核生物基因组远端区域的固有问题,例如缺口、嵌合连接和无法解决的重复诱导压缩。
在这里,我们描述了可可品种 Matina 1-6 的基于 BAC 的综合遗传物理图谱的构建,该图谱旨在增强和促进这些测序工作。构建了三个 BAC 文库,每个文库都包含 10×的覆盖率,并进行了指纹图谱分析。使用 Pooled overgo 杂交将来自高分辨率遗传重组图谱的 230 个遗传标记和 96 个拟南芥保守直系同源物组(COS)II 标记锚定。选择了一个由 29383 个 BAC 组成的密集平铺路径并进行了末端测序。物理图谱由 154 个 contigs 和 4268 个单体组成。49 个 contigs被遗传锚定并排列到染色体上,总跨度为 307.2 Mbp。未锚定的 contigs(105)跨度为 67.4 Mbp,因此可可的估计基因组大小为 374.6 Mbp。与拟南芥、葡萄和杨树的比较分析表明,对这些远缘物种基因组组装的比较可以提供对基因组结构、进化历史、功能位点的保守性以及物理图谱组装的改进的深入了解。可可品种 Matina 1-6 和 Criollo 的比较表明,它们的基因组高度同源,但也观察到了重排。
本研究提供的结果是可可功能开发和增强的独立资源,但也有望补充和增强正在进行的基因组测序工作。该资源将作为通过填补空白、靶向重测序和解决重复 DNA 阵列来细化可可基因组的模板。