Mahood Elizabeth H, Kruse Lars H, Moghe Gaurav D
Plant Biology Section School of Integrative Plant Sciences Cornell University Ithaca New York 14853 USA.
Appl Plant Sci. 2020 Jul 28;8(7):e11376. doi: 10.1002/aps3.11376. eCollection 2020 Jul.
Recent advances in sequencing and informatic technologies have led to a deluge of publicly available genomic data. While it is now relatively easy to sequence, assemble, and identify genic regions in diploid plant genomes, functional annotation of these genes is still a challenge. Over the past decade, there has been a steady increase in studies utilizing machine learning algorithms for various aspects of functional prediction, because these algorithms are able to integrate large amounts of heterogeneous data and detect patterns inconspicuous through rule-based approaches. The goal of this review is to introduce experimental plant biologists to machine learning, by describing how it is currently being used in gene function prediction to gain novel biological insights. In this review, we discuss specific applications of machine learning in identifying structural features in sequenced genomes, predicting interactions between different cellular components, and predicting gene function and organismal phenotypes. Finally, we also propose strategies for stimulating functional discovery using machine learning-based approaches in plants.
测序和信息技术的最新进展带来了大量公开可用的基因组数据。虽然现在对二倍体植物基因组进行测序、组装和鉴定基因区域相对容易,但对这些基因进行功能注释仍然是一项挑战。在过去十年中,利用机器学习算法进行功能预测各个方面的研究稳步增加,因为这些算法能够整合大量异质数据,并检测出基于规则的方法难以察觉的模式。本综述的目的是通过描述机器学习目前如何用于基因功能预测以获得新的生物学见解,向实验植物生物学家介绍机器学习。在本综述中,我们讨论了机器学习在识别测序基因组中的结构特征、预测不同细胞成分之间的相互作用以及预测基因功能和生物体表型方面的具体应用。最后,我们还提出了在植物中使用基于机器学习的方法促进功能发现的策略。