Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100094, China.
National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China.
Plant J. 2022 Sep;111(6):1527-1538. doi: 10.1111/tpj.15905. Epub 2022 Jul 27.
Advances in high-throughput omics technologies are leading plant biology research into the era of big data. Machine learning (ML) performs an important role in plant systems biology because of its excellent performance and wide application in the analysis of big data. However, to achieve ideal performance, supervised ML algorithms require large numbers of labeled samples as training data. In some cases, it is impossible or prohibitively expensive to obtain enough labeled training data; here, the paradigms of unsupervised learning (UL) and semi-supervised learning (SSL) play an indispensable role. In this review, we first introduce the basic concepts of ML techniques, as well as some representative UL and SSL algorithms, including clustering, dimensionality reduction, self-supervised learning (self-SL), positive-unlabeled (PU) learning and transfer learning. We then review recent advances and applications of UL and SSL paradigms in both plant systems biology and plant phenotyping research. Finally, we discuss the limitations and highlight the significance and challenges of UL and SSL strategies in plant systems biology.
高通量组学技术的进步使植物生物学研究进入了大数据时代。机器学习 (ML) 在植物系统生物学中发挥着重要作用,因为它在大数据分析中的出色性能和广泛应用。然而,为了达到理想的性能,监督式 ML 算法需要大量标记样本作为训练数据。在某些情况下,获得足够的标记训练数据是不可能的或代价高昂的;在这里,无监督学习 (UL) 和半监督学习 (SSL) 的范例发挥了不可或缺的作用。在这篇综述中,我们首先介绍了 ML 技术的基本概念,以及一些有代表性的 UL 和 SSL 算法,包括聚类、降维、自监督学习 (self-SL)、正无标记 (PU) 学习和迁移学习。然后,我们回顾了 UL 和 SSL 范例在植物系统生物学和植物表型研究中的最新进展和应用。最后,我们讨论了这些方法的局限性,并强调了 UL 和 SSL 策略在植物系统生物学中的重要性和挑战。