Canella Vieira Caio, Zhou Jing, Usovsky Mariola, Vuong Tri, Howland Amanda D, Lee Dongho, Li Zenglu, Zhou Jianfeng, Shannon Grover, Nguyen Henry T, Chen Pengyin
Fisher Delta Research, Extension, and Education Center, Division of Plant Science and Technology, University of Missouri, Portageville, MO, United States.
Biological Systems Engineering, University of Wisconsin-Madison, Madison, WI, United States.
Front Plant Sci. 2022 May 3;13:883280. doi: 10.3389/fpls.2022.883280. eCollection 2022.
Southern root-knot nematode [SRKN, (Kofold & White) Chitwood] is a plant-parasitic nematode challenging to control due to its short life cycle, a wide range of hosts, and limited management options, of which genetic resistance is the main option to efficiently control the damage caused by SRKN. To date, a major quantitative trait locus (QTL) mapped on chromosome (Chr.) 10 plays an essential role in resistance to SRKN in soybean varieties. The confidence of discovered trait-loci associations by traditional methods is often limited by the assumptions of individual single nucleotide polymorphisms (SNPs) always acting independently as well as the phenotype following a Gaussian distribution. Therefore, the objective of this study was to conduct machine learning (ML)-based genome-wide association studies (GWAS) utilizing Random Forest (RF) and Support Vector Machine (SVM) algorithms to unveil novel regions of the soybean genome associated with resistance to SRKN. A total of 717 breeding lines derived from 330 unique bi-parental populations were genotyped with the Illumina Infinium BARCSoySNP6K BeadChip and phenotyped for SRKN resistance in a greenhouse. A GWAS pipeline involving a supervised feature dimension reduction based on Variable Importance in Projection (VIP) and SNP detection based on classification accuracy was proposed. Minor effect SNPs were detected by the proposed ML-GWAS methodology but not identified using Bayesian-information and linkage-disequilibrium Iteratively Nested Keyway (BLINK), Fixed and Random Model Circulating Probability Unification (FarmCPU), and Enriched Compressed Mixed Linear Model (ECMLM) models. Besides the genomic region on Chr. 10 that can explain most of SRKN resistance variance, additional minor effects SNPs were also identified on Chrs. 10 and 11. The findings in this study demonstrated that overfitting in GWAS may lead to lower prediction accuracy, and the detection of significant SNPs based on classification accuracy limited false-positive associations. The expansion of the basis of the genetic resistance to SRKN can potentially reduce the selection pressure over the major QTL on Chr. 10 and achieve higher levels of resistance.
南方根结线虫[SRKN,(Kofold & White) Chitwood]是一种植物寄生线虫,由于其生命周期短、寄主范围广且管理选择有限,难以控制,其中遗传抗性是有效控制SRKN造成损害的主要选择。迄今为止,定位在第10号染色体(Chr.)上的一个主要数量性状位点(QTL)在大豆品种对SRKN的抗性中起着至关重要的作用。传统方法发现的性状-基因座关联的可信度通常受到单个单核苷酸多态性(SNP)总是独立起作用以及表型遵循高斯分布这些假设的限制。因此,本研究的目的是利用随机森林(RF)和支持向量机(SVM)算法进行基于机器学习(ML)的全基因组关联研究(GWAS),以揭示大豆基因组中与SRKN抗性相关的新区域。对来自330个独特双亲群体的717个育种系进行了基因分型,使用Illumina Infinium BARCSoySNP6K芯片,并在温室中对SRKN抗性进行了表型分析。提出了一个GWAS流程,包括基于投影重要性(VIP)的监督特征降维和基于分类准确性的SNP检测。通过所提出的ML-GWAS方法检测到了微效SNP,但使用贝叶斯信息和连锁不平衡迭代嵌套关键路径(BLINK)、固定和随机模型循环概率统一(FarmCPU)以及富集压缩混合线性模型(ECMLM)模型未识别出这些微效SNP。除了第10号染色体上能够解释大部分SRKN抗性变异的基因组区域外,在第10号和第11号染色体上也鉴定出了其他微效SNP。本研究的结果表明,GWAS中的过拟合可能导致较低的预测准确性,基于分类准确性检测显著SNP可限制假阳性关联。扩大对SRKN遗传抗性的基础可能会降低对第10号染色体上主要QTL的选择压力,并实现更高水平的抗性。