Ahn Ezekiel, Prom Louis K, Park Sunchung, Lee Dongho, Bhatt Jishnu, Ellur Vishnutej, Lim Seunghyun, Jang Jae Hee, Lakshman Dilip, Magill Clint
Sustainable Perennial Crops Laboratory, Agricultural Research Service, United States Department of Agriculture, Beltsville, MD, USA.
Insect Control and Cotton Disease Research, Agricultural Research Service, Southern Plains Agricultural Research Center, United States Department of Agriculture, College Station, TX, USA.
Heredity (Edinb). 2025 Jul 19. doi: 10.1038/s41437-025-00783-9.
Plant disease resistance is often a complex, polygenic trait, making its genetic dissection with traditional genome-wide association studies (GWAS) challenging. Grain mold in sorghum, a devastating disease caused by a fungal complex, exemplifies this complexity. We hypothesized that a machine learning (ML)-driven GWAS, employing diverse phenotypic representations from a panel of 306 sorghum accessions, could more effectively unravel the genetic basis of resistance. Phenotypic data, including raw disease scores, a 'difference phenotype' (inoculated vs. control), and principal components, were analyzed using Boosted Tree and Bootstrap Forest models, demonstrating strong explanatory power for phenotypic variance when trained on the entire dataset. This ML-GWAS approach confirmed a highly polygenic architecture for grain mold resistance, identifying numerous SNPs across the sorghum genome. Notably, several SNPs were consistently associated with resistance across multiple analytical models and phenotypic representations. These robustly identified SNPs were frequently located near genes with predicted functions integral to plant defense. Gene ontology (GO) analyses of the candidate gene set confirmed enrichment in categories supporting roles in pathogen recognition, DNA repair, and stress response modulation, indicating a multifaceted defense mechanism. This study provides valuable candidate genes for breeding sorghum with enhanced grain mold resistance and offers a refined methodological framework for dissecting complex traits in this crop. The successful application of this ML-based strategy in sorghum suggests its potential utility for studying similar complex traits in other plant species.
植物抗病性通常是一个复杂的多基因性状,这使得用传统的全基因组关联研究(GWAS)对其进行遗传剖析具有挑战性。高粱粒腐病是一种由真菌复合体引起的毁灭性病害,就是这种复杂性的例证。我们假设,一种机器学习(ML)驱动的GWAS,利用来自306份高粱种质的多样表型表征,能够更有效地揭示抗性的遗传基础。使用增强树模型和自助森林模型分析了包括原始病害评分、“差异表型”(接种与对照)以及主成分在内的表型数据,结果表明在整个数据集上进行训练时,这些模型对表型变异具有很强的解释力。这种ML-GWAS方法证实了高粱粒腐病抗性具有高度多基因的结构,在高粱基因组中鉴定出了大量单核苷酸多态性(SNP)。值得注意的是,在多个分析模型和表型表征中,有几个SNP始终与抗性相关。这些经过可靠鉴定的SNP经常位于预测功能与植物防御不可或缺的基因附近。对候选基因集的基因本体(GO)分析证实,在支持病原体识别、DNA修复和应激反应调节作用的类别中存在富集,这表明存在多方面的防御机制。本研究为培育具有更强高粱粒腐病抗性的高粱提供了有价值的候选基因,并为剖析该作物的复杂性状提供了一个优化的方法框架。这种基于ML的策略在高粱中的成功应用表明了其在研究其他植物物种类似复杂性状方面的潜在效用。