Mummadi Sai Teja, Islam Md Khairul, Busov Victor, Wei Hairong
Department of Computer Science, Michigan Technological University, Houghton, MI, USA.
Computational Science and Engineering, Michigan Technological University, Houghton, MI, USA.
For Res (Fayettev). 2025 Jul 30;5:e014. doi: 10.48130/forres-0025-0014. eCollection 2025.
Construction of gene regulatory networks (GRNs) is essential for elucidating the regulatory mechanisms underlying metabolic pathways, biological processes, and complex traits. In this study, we developed and evaluated machine learning, deep learning, and hybrid approaches for constructing GRNs by integrating prior knowledge and large-scale transcriptomic data from , poplar, and maize. Among these, hybrid models that combined convolutional neural networks and machine learning consistently outperformed traditional machine learning and statistical methods, achieving over 95% accuracy on the holdout test datasets. These models not only identified a greater number of known transcription factors regulating the lignin biosynthesis pathway but also demonstrated higher precision in ranking key master regulators such as MYB46 and MYB83, as well as many upstream regulators, including members of the VND, NST, and SND families, at the top of candidate lists. To address the challenge of limited training data in non-model species, we implemented transfer learning, enabling cross-species GRN inference by applying models trained on well-characterized and data-rich species to another species with limited data. This strategy enhanced model performance and demonstrated the feasibility of knowledge transfer across species. Overall, our findings underscore the effectiveness of hybrid and transfer learning approaches in GRN prediction, offering a scalable framework for elucidating regulatory mechanisms in both model and non-model plant systems.
构建基因调控网络(GRNs)对于阐明代谢途径、生物过程和复杂性状背后的调控机制至关重要。在本研究中,我们开发并评估了通过整合来自杨树、玉米的先验知识和大规模转录组数据来构建GRNs的机器学习、深度学习和混合方法。其中,结合卷积神经网络和机器学习的混合模型始终优于传统机器学习和统计方法,在留出测试数据集上的准确率超过95%。这些模型不仅识别出更多调控木质素生物合成途径的已知转录因子,还在对关键主调控因子(如MYB46和MYB83)以及许多上游调控因子(包括VND、NST和SND家族成员)进行排名时,在候选列表顶部显示出更高的精度。为应对非模式物种中训练数据有限的挑战,我们实施了迁移学习,通过将在特征明确且数据丰富的物种上训练的模型应用于数据有限的另一物种,实现跨物种GRN推断。该策略提高了模型性能,并证明了跨物种知识转移的可行性。总体而言,我们的研究结果强调了混合和迁移学习方法在GRN预测中的有效性,为阐明模式和非模式植物系统中的调控机制提供了一个可扩展的框架。