Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA.
Bioinformatics. 2011 Jan 1;27(1):63-9. doi: 10.1093/bioinformatics/btq558. Epub 2010 Oct 29.
Most biological traits may be correlated with the underlying gene expression patterns that are partially determined by DNA sequence variation. The correlations between gene expressions and quantitative traits are essential for understanding the functions of genes and dissecting gene regulatory networks.
In the present study, we adopted a novel statistical method, called the stochastic expectation and maximization (SEM) algorithm, to analyze the associations between gene expression levels and quantitative trait values and identify genetic loci controlling the gene expression variations. In the first step, gene expression levels measured from microarray experiments were assigned to two different clusters based on the strengths of their association with the phenotypes of a quantitative trait under investigation. In the second step, genes associated with the trait were mapped to genetic loci of the genome. Because gene expressions are quantitative, the genetic loci controlling the expression traits are called expression quantitative trait loci. We applied the same SEM algorithm to a real dataset collected from a barley genetic experiment with both quantitative traits and gene expression traits. For the first time, we identified genes associated with eight agronomy traits of barley. These genes were then mapped to seven chromosomes of the barley genome. The SEM algorithm and the result of the barley data analysis are useful to scientists in the areas of bioinformatics and plant breeding.
The R program for the SEM algorithm can be downloaded from our website: http://www.statgen.ucr.edu.
大多数生物特征可能与潜在的基因表达模式相关,而这些模式部分由 DNA 序列变异决定。基因表达与数量性状之间的相关性对于理解基因的功能和剖析基因调控网络至关重要。
在本研究中,我们采用了一种新颖的统计方法,称为随机期望最大化(SEM)算法,来分析基因表达水平与数量性状值之间的关联,并确定控制基因表达变异的遗传位点。在第一步中,根据基因表达水平与所研究的数量性状表型之间的关联强度,将来自微阵列实验的基因表达水平分配到两个不同的簇中。在第二步中,与性状相关的基因被映射到基因组的遗传位点上。由于基因表达是定量的,因此控制表达性状的遗传位点称为表达数量性状位点。我们将相同的 SEM 算法应用于来自大麦遗传实验的具有数量性状和基因表达性状的真实数据集。我们首次鉴定了与大麦的八个农艺性状相关的基因。然后,这些基因被映射到大麦基因组的七个染色体上。SEM 算法和大麦数据分析的结果对于生物信息学和植物育种领域的科学家非常有用。
SEM 算法的 R 程序可以从我们的网站下载:http://www.statgen.ucr.edu。