Department of Plant Biology, University of California, Davis, California 95616, USA.
Plant Physiol. 2013 Jun;162(2):537-52. doi: 10.1104/pp.112.213546. Epub 2013 Apr 12.
Developmental differences between species commonly result from changes in the tissue-specific expression of genes. Clustering algorithms are a powerful means to detect coexpression across tissues in single species but are not often applied to multidimensional data sets, such as gene expression across tissues in multiple species. As next-generation sequencing approaches enable interspecific analyses, methods to visualize and explore such data sets will be required. Here, we analyze a data set comprising gene expression profiles across six different tissue types in domesticated tomato (Solanum lycopersicum) and a wild relative (Solanum pennellii). We find that self-organizing maps are a useful means to analyze interspecies data, as orthologs can be assigned to independent levels of a "super self-organizing map." We compare various clustering approaches using a principal component analysis in which the expression of orthologous pairs is indicated by two points. We leverage the expression profile differences between orthologs to look at tissue-specific changes in gene expression between species. Clustering based on expression differences between species (rather than absolute expression profiles) yields groups of genes with large tissue-by-species interactions. The changes in expression profiles of genes we observe reflect differences in developmental architecture, such as changes in meristematic activity between S. lycopersicum and S. pennellii. Together, our results offer a suite of data-exploration methods that will be important to visualize and make biological sense of next-generation sequencing experiments designed explicitly to discover tissue-by-species interactions in gene expression data.
物种间的发育差异通常是由于基因在组织特异性表达中的变化引起的。聚类算法是检测单一物种中组织间共表达的有力手段,但通常不适用于多维数据集,例如多个物种组织中的基因表达。随着下一代测序方法能够进行种间分析,需要有方法来可视化和探索此类数据集。在这里,我们分析了一个包含六个不同组织类型的基因表达谱的数据集,这些组织类型分别是驯化番茄(Solanum lycopersicum)和野生亲缘种(Solanum pennellii)。我们发现,自组织映射是分析种间数据的有用方法,因为同源基因可以被分配到“超级自组织映射”的独立层次上。我们使用主成分分析比较了各种聚类方法,其中同源对的表达由两个点表示。我们利用同源基因之间的表达差异来观察物种间基因表达的组织特异性变化。基于物种间表达差异(而不是绝对表达谱)的聚类产生了具有大组织-物种相互作用的基因群。我们观察到的基因表达谱变化反映了发育结构的差异,例如在 S. lycopersicum 和 S. pennellii 之间的分生组织活性的变化。总之,我们的研究结果提供了一系列数据探索方法,这些方法对于可视化和理解专门设计用于发现基因表达数据中组织-物种相互作用的下一代测序实验非常重要。