Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588.
Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68588.
Proc Natl Acad Sci U S A. 2021 Mar 9;118(10). doi: 10.1073/pnas.2026330118.
Although genome-sequence assemblies are available for a growing number of plant species, gene-expression responses to stimuli have been cataloged for only a subset of these species. Many genes show altered transcription patterns in response to abiotic stresses. However, orthologous genes in related species often exhibit different responses to a given stress. Accordingly, data on the regulation of gene expression in one species are not reliable predictors of orthologous gene responses in a related species. Here, we trained a supervised classification model to identify genes that transcriptionally respond to cold stress. A model trained with only features calculated directly from genome assemblies exhibited only modest decreases in performance relative to models trained by using genomic, chromatin, and evolution/diversity features. Models trained with data from one species successfully predicted which genes would respond to cold stress in other related species. Cross-species predictions remained accurate when training was performed in cold-sensitive species and predictions were performed in cold-tolerant species and vice versa. Models trained with data on gene expression in multiple species provided at least equivalent performance to models trained and tested in a single species and outperformed single-species models in cross-species prediction. These results suggest that classifiers trained on stress data from well-studied species may suffice for predicting gene-expression patterns in related, less-studied species with sequenced genomes.
尽管越来越多的植物物种的基因组序列组装可供使用,但只有这些物种中的一部分对刺激的基因表达反应进行了编目。许多基因在应对非生物胁迫时表现出转录模式的改变。然而,相关物种中的同源基因对给定的胁迫通常表现出不同的反应。因此,一个物种中基因表达调控的数据不能可靠地预测相关物种中同源基因的反应。在这里,我们训练了一个监督分类模型来识别转录响应冷胁迫的基因。仅使用直接从基因组组装中计算得出的特征训练的模型相对于使用基因组、染色质和进化/多样性特征训练的模型,性能仅略有下降。在一个物种中使用数据训练的模型成功预测了哪些基因会对其他相关物种的冷胁迫产生反应。当在敏感物种中进行训练并在耐受物种中进行预测或反之亦然时,跨物种预测仍然准确。使用多种物种的基因表达数据训练的模型至少提供了与在单个物种中训练和测试的模型相当的性能,并且在跨物种预测方面优于单物种模型。这些结果表明,在研究充分的物种的应激数据上训练的分类器可能足以预测具有测序基因组的相关研究较少的物种的基因表达模式。