Department of Computer Science, Stanford University, Stanford, California, USA.
Nat Methods. 2012 Nov;9(11):1120-5. doi: 10.1038/nmeth.2207. Epub 2012 Oct 14.
Measuring complete gene expression profiles for a large number of experiments is costly. We propose an approach in which a small subset of probes is selected based on a preliminary set of full expression profiles. In subsequent experiments, only the subset is measured, and the missing values are inputed. We developed several algorithms to simultaneously select probes and input missing values, and we demonstrate that these 'probe selection for imputation' (PSI) algorithms can successfully reconstruct missing gene expression values in a wide variety of applications, as evaluated using multiple metrics of biological importance. We analyze the performance of PSI methods under varying conditions, provide guidelines for choosing the optimal method based on the experimental setting, and indicate how to estimate imputation accuracy. Finally, we apply our approach to a large-scale study of immune system variation.
测量大量实验的完整基因表达谱是昂贵的。我们提出了一种方法,该方法基于初步的完整表达谱来选择一小部分探针。在后续实验中,仅测量子集,并输入缺失值。我们开发了几种同时选择探针和输入缺失值的算法,并证明这些“用于插补的探针选择”(PSI)算法可以成功地在各种应用中重建缺失的基因表达值,这是通过多种生物学重要性度量来评估的。我们分析了 PSI 方法在不同条件下的性能,根据实验设置提供了选择最佳方法的指南,并说明了如何估计插补精度。最后,我们将我们的方法应用于免疫系统变异的大规模研究。