Wang Jingtian, Chen Ying, Shu Guoping, Zhao Miaomiao, Zheng Ao, Chang Xiaoyu, Li Guiqi, Wang Yibo, Zhang Yuan-Ming
College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China.
LongPing HighTech Maize Innovation Center, Zhengzhou 450041, China.
Plant Commun. 2025 Jul 14;6(7):101385. doi: 10.1016/j.xplc.2025.101385. Epub 2025 May 22.
Genetic dissection and breeding by design for polygenic traits remain substantial challenges. To address these challenges, it is important to identify as many genes as possible, including key regulatory genes. Here, we developed a genome-wide scanning plus machine learning framework, integrated with advanced computational techniques, to propose a novel algorithm named Fast3VmrMLM. This algorithm aims to enhance the identification of abundant and key genes for polygenic traits in the era of big data and artificial intelligence. The algorithm was extended to identify haplotype (Fast3VmrMLM-Hap) and molecular (Fast3VmrMLM-mQTL) variants. In simulation studies, Fast3VmrMLM outperformed existing methods in detecting dominant, small, and rare variants, requiring only 3.30 and 5.43 h (20 threads) to analyze the 18K rice and UK Biobank-scale datasets, respectively. Fast3VmrMLM identified more known (211) and candidate (384) genes for 14 traits in the 18K rice dataset than FarmCPU (100 known genes). Additionally, it identified 26 known and 24 candidate genes for seven yield-related traits in a maize NC II design; Fast3VmrMLM-mQTL identified two known soybean genes near structural variants. We demonstrated that this novel two-step framework outperformed genome-wide scanning alone. In breeding by design, a genetic network constructed via machine learning using all known and candidate genes identified in this study revealed 21 key genes associated with rice yield-related traits. All associated markers yielded high prediction accuracies in rice (0.7443) and maize (0.8492), enabling the development of superior hybrid combinations. A new breeding-by-design strategy based on the identified key genes was also proposed. This study provides an effective method for gene mining and breeding by design.
对多基因性状进行遗传剖析和设计育种仍然面临重大挑战。为应对这些挑战,尽可能多地鉴定基因,包括关键调控基因,至关重要。在此,我们开发了一个全基因组扫描加机器学习框架,并结合先进的计算技术,提出了一种名为Fast3VmrMLM的新算法。该算法旨在在大数据和人工智能时代增强对多基因性状丰富基因和关键基因的鉴定。该算法被扩展用于鉴定单倍型(Fast3VmrMLM-Hap)和分子(Fast3VmrMLM-mQTL)变异。在模拟研究中,Fast3VmrMLM在检测显性、小效应和稀有变异方面优于现有方法,分析18K水稻数据集和英国生物银行规模的数据集分别仅需3.30小时和5.43小时(20个线程)。Fast3VmrMLM在18K水稻数据集中为14个性状鉴定出的已知基因(211个)和候选基因(384个)比FarmCPU(100个已知基因)更多。此外,它在玉米NC II设计中为7个产量相关性状鉴定出26个已知基因和24个候选基因;Fast3VmrMLM-mQTL在结构变异附近鉴定出两个已知大豆基因。我们证明了这个新的两步框架优于单独的全基因组扫描。在设计育种中,通过机器学习利用本研究中鉴定出的所有已知基因和候选基因构建的遗传网络揭示了21个与水稻产量相关性状相关的关键基因。所有相关标记在水稻(0.7443)和玉米(0.8492)中都具有很高的预测准确性,从而能够开发出优良的杂交组合。还提出了一种基于已鉴定关键基因的新的设计育种策略。本研究为基因挖掘和设计育种提供了一种有效方法。