Khalilisamani Nima, Li Zitong, Pettolino Filomena A, Moncuquet Philippe, Reverter Antonio, MacMillan Colleen P
Cotton Biotechnology, Agriculture and Food, CSIRO, Canberra, ACT, Australia.
Livestock and Aquatic Genomics, Agriculture and Food, CSIRO, St Lucia, QLD, Australia.
Front Plant Sci. 2024 Sep 20;15:1420837. doi: 10.3389/fpls.2024.1420837. eCollection 2024.
Cultivated cotton plants are the world's largest source of natural fibre, where yield and quality are key traits for this renewable and biodegradable commodity. The cotton genome contains ~80K protein-coding genes, making precision breeding of complex traits a challenge. This study tested approaches to improving the genomic prediction (GP) accuracy of valuable cotton fibre traits to help accelerate precision breeding. With a biology-informed basis, a novel approach was tested for improving GP for key cotton fibre traits with transcriptomics of key time points during fibre development, namely, fibre cells undergoing primary, transition, and secondary wall development. Three test approaches included weighting of SNPs in DE genes overall, in target DE gene lists informed by gene annotation, and in a novel approach of gene co-expression network (GCN) clusters created with partial correlation and information theory (PCIT) as the prior information in GP models. The GCN clusters were nucleated with known genes for fibre biomechanics, i.e., fasciclin-like arabinogalactan proteins, and cluster size effects were evaluated. The most promising improvements in GP accuracy were achieved by using GCN clusters for cotton fibre elongation by 4.6%, and strength by 4.7%, where cluster sizes of two and three neighbours proved most effective. Furthermore, the improvements in GP were due to only a small number of SNPs, in the order of 30 per trait using the GCN cluster approach. Non-trait-specific biological time points, and genes, were found to have neutral effects, or even reduced GP accuracy for certain traits. As the GCN clusters were generated based on known genes for fibre biomechanics, additional candidate genes were identified for fibre elongation and strength. These results demonstrate that GCN clusters make a specific and unique contribution in improving the GP of cotton fibre traits. The findings also indicate that there is room for incorporating biology-based GCNs into GP models of genomic selection pipelines for cotton breeding to help improve precision breeding of target traits. The PCIT-GCN cluster approach may also hold potential application in other crops and trees for enhancing breeding of complex traits.
栽培棉株是世界上最大的天然纤维来源,产量和品质是这种可再生且可生物降解商品的关键特性。棉花基因组包含约8万个蛋白质编码基因,这使得复杂性状的精准育种成为一项挑战。本研究测试了提高棉花重要纤维性状基因组预测(GP)准确性的方法,以助力加速精准育种。基于生物学知识,测试了一种新方法,通过纤维发育关键时间点的转录组学来提高棉花关键纤维性状的GP,这些关键时间点即处于初生、过渡和次生壁发育阶段的纤维细胞。三种测试方法包括:对差异表达(DE)基因中的单核苷酸多态性(SNP)进行整体加权、在基于基因注释的目标DE基因列表中加权,以及在一种新方法中,将以偏相关和信息理论(PCIT)创建的基因共表达网络(GCN)簇作为GP模型中的先验信息进行加权。GCN簇以已知的纤维生物力学基因(即类成束蛋白阿拉伯半乳聚糖蛋白)为核心,并评估了簇大小的影响。通过使用GCN簇,棉花纤维伸长的GP准确性提高了4.6%,强度提高了4.7%,其中两个和三个相邻基因的簇大小被证明最有效。此外,GP的提高仅归因于少量SNP,使用GCN簇方法时每个性状约为30个。发现非性状特异性的生物学时间点和基因具有中性影响,甚至会降低某些性状的GP准确性。由于GCN簇是基于已知的纤维生物力学基因生成的,因此鉴定出了用于纤维伸长和强度的额外候选基因。这些结果表明,GCN簇在提高棉花纤维性状的GP方面做出了特定且独特的贡献。研究结果还表明,将基于生物学的GCN纳入棉花育种基因组选择流程的GP模型中还有空间,以帮助提高目标性状的精准育种。PCIT - GCN簇方法在其他作物和树木的复杂性状育种增强方面也可能具有潜在应用。