Love Tanzy, Carriquiry Alicia
Postdoctoral Fellow, Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, NY 14642 (
J Am Stat Assoc. 2009 Jun 1;104(486):524-540. doi: 10.1198/jasa.2009.0019.
We analyze data collected in a somatic embryogenesis experiment carried out on Zea mays at Iowa State University. The main objective of the study was to identify the set of genes in maize that actively participate in embryo development. Embryo tissue was sampled and analyzed at various time periods and under different mediums and light conditions. As is the case in many microarray experiments, the operator scanned each slide multiple times to find the slide-specific 'optimal' laser and sensor settings. The multiple readings of each slide are repeated measurements on different scales with differing censoring; they cannot be considered to be replicate measurements in the traditional sense. Yet it has been shown that the choice of reading can have an impact on genetic inference. We propose a hierarchical modeling approach to estimating gene expression that combines all available readings on each spot and accounts for censoring in the observed values. We assess the statistical properties of the proposed expression estimates using a simulation experiment. As expected, combining all available scans using an approach with good statistical properties results in expression estimates with noticeably lower bias and root mean squared error relative to other approaches that have been proposed in the literature. Inferences drawn from the somatic embryogenesis experiment, which motivated this work changed drastically when data were analyzed using the standard approaches or using the methodology we propose.
我们分析了爱荷华州立大学在玉米上进行的体细胞胚胎发生实验中收集的数据。该研究的主要目的是确定玉米中积极参与胚胎发育的基因集。在不同时间段、不同培养基和光照条件下对胚胎组织进行采样和分析。与许多微阵列实验一样,操作人员对每张玻片进行多次扫描,以找到特定玻片的“最佳”激光和传感器设置。每张玻片的多次读数是在不同尺度上进行的重复测量,且有不同的删失;它们不能被视为传统意义上的重复测量。然而,研究表明读数的选择会对基因推断产生影响。我们提出一种层次建模方法来估计基因表达,该方法结合了每个点上所有可用的读数,并考虑了观测值中的删失。我们通过模拟实验评估所提出的表达估计值的统计特性。不出所料,相对于文献中提出的其他方法,使用具有良好统计特性的方法组合所有可用扫描结果会使表达估计值的偏差和均方根误差显著降低。当使用标准方法或我们提出的方法分析数据时,推动这项工作的体细胞胚胎发生实验得出的推断发生了巨大变化。