用于预测植物组织身份的基于表达的机器学习模型。

Expression-based machine learning models for predicting plant tissue identity.

作者信息

Palande Sourabh, Arsenault Jeremy, Basurto-Lozada Patricia, Bleich Andrew, Brown Brianna N I, Buysse Sophia F, Connors Noelle A, Das Adhikari Sikta, Dobson Kara C, Guerra-Castillo Francisco Xavier, Guerrero-Carrillo Maria F, Harlow Sophia, Herrera-Orozco Héctor, Hightower Asia T, Izquierdo Paulo, Jacobs MacKenzie, Johnson Nicholas A, Leuenberger Wendy, Lopez-Hernandez Alessandro, Luckie-Duque Alicia, Martínez-Avila Camila, Mendoza-Galindo Eddy J, Plancarte David Cruz, Schuster Jenny M, Shomer Harry, Sitar Sidney C, Steensma Anne K, Thomson Joanne Elise, Villaseñor-Amador Damián, Waterman Robin, Webster Brandon M, Whyte Madison, Zorilla-Azcué Sofía, Montgomery Beronda L, Husbands Aman Y, Krishnan Arjun, Percival Sarah, Munch Elizabeth, VanBuren Robert, Chitwood Daniel H, Rougon-Cardoso Alejandra

机构信息

Department of Computational Mathematics, Science and Engineering Michigan State University East Lansing Michigan USA.

Department of Computer Science and Engineering Michigan State University East Lansing Michigan USA.

出版信息

Appl Plant Sci. 2024 Oct 19;13(1):e11621. doi: 10.1002/aps3.11621. eCollection 2025 Jan-Feb.

DOI:10.1002/aps3.11621

PMID:39906497

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11788907/

Abstract

PREMISE

The selection of as a model organism played a pivotal role in advancing genomic science. The competing frameworks to select an agricultural- or ecological-based model species were rejected, in favor of building knowledge in a species that would facilitate genome-enabled research.

METHODS

Here, we examine the ability of models based on gene expression data to predict tissue identity in other flowering plants. Comparing different machine learning algorithms, models trained and tested on data achieved near perfect precision and recall values, whereas when tissue identity is predicted across the flowering plants using models trained on data, precision values range from 0.69 to 0.74 and recall from 0.54 to 0.64.

RESULTS

The identity of belowground tissue can be predicted more accurately than other tissue types, and the ability to predict tissue identity is not correlated with phylogenetic distance from . -nearest neighbors is the most successful algorithm, suggesting that gene expression signatures, rather than marker genes, are more valuable to create models for tissue and cell type prediction in plants.

DISCUSSION

Our data-driven results highlight that the assertion that knowledge from is translatable to other plants is not always true. Considering the current landscape of abundant sequencing data, we should reevaluate the scientific emphasis on and prioritize plant diversity.

摘要

前提

选择[具体物种]作为模式生物在推进基因组科学方面发挥了关键作用。选择基于农业或生态的模式物种的竞争框架被否决，转而支持在一个有助于开展基因组研究的物种中积累知识。

方法

在此，我们研究基于[具体物种]基因表达数据的模型预测其他开花植物组织身份的能力。比较不同的机器学习算法，在[具体物种]数据上训练和测试的模型实现了近乎完美的精确率和召回率值，而当使用在[具体物种]数据上训练的模型预测整个开花植物的组织身份时，精确率值范围为0.69至0.74，召回率为0.54至0.64。