Suppr超能文献

Fast3VmrMLM:一种快速算法,它将全基因组扫描与机器学习相结合,以加速大规模全基因组关联研究(GWAS)数据集中多基因性状的基因挖掘和设计育种。

Fast3VmrMLM: A fast algorithm that integrates genome-wide scanning with machine learning to accelerate gene mining and breeding by design for polygenic traits in large-scale GWAS datasets.

作者信息

Wang Jingtian, Chen Ying, Shu Guoping, Zhao Miaomiao, Zheng Ao, Chang Xiaoyu, Li Guiqi, Wang Yibo, Zhang Yuan-Ming

机构信息

College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China.

LongPing HighTech Maize Innovation Center, Zhengzhou 450041, China.

出版信息

Plant Commun. 2025 Jul 14;6(7):101385. doi: 10.1016/j.xplc.2025.101385. Epub 2025 May 22.

Abstract

Genetic dissection and breeding by design for polygenic traits remain substantial challenges. To address these challenges, it is important to identify as many genes as possible, including key regulatory genes. Here, we developed a genome-wide scanning plus machine learning framework, integrated with advanced computational techniques, to propose a novel algorithm named Fast3VmrMLM. This algorithm aims to enhance the identification of abundant and key genes for polygenic traits in the era of big data and artificial intelligence. The algorithm was extended to identify haplotype (Fast3VmrMLM-Hap) and molecular (Fast3VmrMLM-mQTL) variants. In simulation studies, Fast3VmrMLM outperformed existing methods in detecting dominant, small, and rare variants, requiring only 3.30 and 5.43 h (20 threads) to analyze the 18K rice and UK Biobank-scale datasets, respectively. Fast3VmrMLM identified more known (211) and candidate (384) genes for 14 traits in the 18K rice dataset than FarmCPU (100 known genes). Additionally, it identified 26 known and 24 candidate genes for seven yield-related traits in a maize NC II design; Fast3VmrMLM-mQTL identified two known soybean genes near structural variants. We demonstrated that this novel two-step framework outperformed genome-wide scanning alone. In breeding by design, a genetic network constructed via machine learning using all known and candidate genes identified in this study revealed 21 key genes associated with rice yield-related traits. All associated markers yielded high prediction accuracies in rice (0.7443) and maize (0.8492), enabling the development of superior hybrid combinations. A new breeding-by-design strategy based on the identified key genes was also proposed. This study provides an effective method for gene mining and breeding by design.

摘要

对多基因性状进行遗传剖析和设计育种仍然面临重大挑战。为应对这些挑战,尽可能多地鉴定基因,包括关键调控基因,至关重要。在此,我们开发了一个全基因组扫描加机器学习框架,并结合先进的计算技术,提出了一种名为Fast3VmrMLM的新算法。该算法旨在在大数据和人工智能时代增强对多基因性状丰富基因和关键基因的鉴定。该算法被扩展用于鉴定单倍型(Fast3VmrMLM-Hap)和分子(Fast3VmrMLM-mQTL)变异。在模拟研究中,Fast3VmrMLM在检测显性、小效应和稀有变异方面优于现有方法,分析18K水稻数据集和英国生物银行规模的数据集分别仅需3.30小时和5.43小时(20个线程)。Fast3VmrMLM在18K水稻数据集中为14个性状鉴定出的已知基因(211个)和候选基因(384个)比FarmCPU(100个已知基因)更多。此外,它在玉米NC II设计中为7个产量相关性状鉴定出26个已知基因和24个候选基因;Fast3VmrMLM-mQTL在结构变异附近鉴定出两个已知大豆基因。我们证明了这个新的两步框架优于单独的全基因组扫描。在设计育种中,通过机器学习利用本研究中鉴定出的所有已知基因和候选基因构建的遗传网络揭示了21个与水稻产量相关性状相关的关键基因。所有相关标记在水稻(0.7443)和玉米(0.8492)中都具有很高的预测准确性,从而能够开发出优良的杂交组合。还提出了一种基于已鉴定关键基因的新的设计育种策略。本研究为基因挖掘和设计育种提供了一种有效方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a4d/12281254/4f4f6993a079/gr1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验