Baek Insuck, Cha Minhyeok, Lim Seunghyun, Irish Brian M, Oh Sookyung, Bhatt Jishnu, Upadhyay Rakesh K, Kim Moon S, Meinhardt Lyndel W, Park Sunchung, Ahn Ezekiel
Environmental Microbial and Food Safety Laboratory, Agricultural Research Service, Department of Agriculture, Beltsville, MD, 20705, USA.
Sustainable Perennial Crops Laboratory, Agricultural Research Service, Department of Agriculture, Beltsville, MD, 20705, USA.
BMC Plant Biol. 2025 Aug 9;25(1):1050. doi: 10.1186/s12870-025-07128-y.
Cacao (Theobroma cacao L.) breeding and improvement rely on understanding germplasm diversity and trait architecture. This study characterized a cacao collection (173 accessions) evaluated in Puerto Rico, examining phenotypic diversity, trait interrelationships, and performing comparative analyses with published Trinidad and Colombia datasets. We also developed machine learning (ML) models for yield prediction and identified yield-associated SNP markers.
The cacao collection showed significant phenotypic variation and strong intra-collection trait correlations. Comparative analyses revealed conserved trait responses across environments, notably linking susceptibility to black pod rot in Puerto Rico with Witches' Broom Disease in Colombia, suggesting a broad-spectrum disease response mechanism. Machine learning models effectively modeled yield, quantifying a hierarchy of predictor importance, with 'Total pods', 'Infection rate', and 'Pod weight' being the most influential. Integrating existing SNP data for 28 common accessions, multiple SNPs were identified as significantly associated with key horticultural traits, including 'Total pods', 'Infection rate', and 'Yield' (FDR < 0.01). Notably, a single genetic marker on chromosome 5 (TcSNP475), located within a putative zinc finger stress-associated protein gene (Tc05_t008610), was associated with both 'Total pods' and 'Yield', representing a prime target for marker-assisted selection.
This research provides a detailed characterization of a wide germplasm collection, robust yield predictors, and a suite of novel trait-linked genetic markers, offering valuable resources for cacao breeding. These integrated findings will provide a solid foundation for targeted breeding strategies and deeper molecular investigations into the mechanisms underpinning yield and stress resilience in this vital global crop.
可可(Theobroma cacao L.)的育种与改良依赖于对种质多样性和性状结构的了解。本研究对在波多黎各评估的一个可可种质收集群体(173份种质)进行了特征分析,考察了表型多样性、性状间的相互关系,并与已发表的特立尼达和哥伦比亚数据集进行了比较分析。我们还开发了用于产量预测的机器学习(ML)模型,并鉴定了与产量相关的单核苷酸多态性(SNP)标记。
该可可种质收集群体表现出显著的表型变异和强烈的群体内性状相关性。比较分析揭示了不同环境下保守的性状反应,特别是将波多黎各对黑荚病的易感性与哥伦比亚的女巫扫帚病联系起来,这表明存在一种广谱病害反应机制。机器学习模型有效地模拟了产量,量化了预测因子重要性的层次结构,其中“总荚果数”、“感染率”和“荚果重量”影响最大。整合28份常见种质的现有SNP数据,鉴定出多个与关键园艺性状显著相关的SNP,包括“总荚果数”、“感染率”和“产量”(错误发现率<0.01)。值得注意的是,位于一个假定的锌指胁迫相关蛋白基因(Tc05_t008610)内的5号染色体上的一个单基因标记(TcSNP475)与“总荚果数”和“产量”均相关,是标记辅助选择的主要目标。
本研究对一个广泛的种质收集群体进行了详细的特征分析,提供了可靠的产量预测因子和一系列与新性状相关的遗传标记,为可可育种提供了宝贵资源。这些综合研究结果将为有针对性的育种策略以及对这种重要全球作物产量和抗逆性机制的更深入分子研究奠定坚实基础。