Department of Statistics, Purdue University, West Lafayette, IN 47907, USA.
Bioinformatics. 2011 Aug 15;27(16):2173-80. doi: 10.1093/bioinformatics/btr359. Epub 2011 Jun 17.
High-throughput perturbation screens measure the phenotypes of thousands of biological samples under various conditions. The phenotypes measured in the screens are subject to substantial biological and technical variation. At the same time, in order to enable high throughput, it is often impossible to include a large number of replicates, and to randomize their order throughout the screens. Distinguishing true changes in the phenotype from stochastic variation in such experimental designs is extremely challenging, and requires adequate statistical methodology.
We propose a statistical modeling framework that is based on experimental designs with at least two controls profiled throughout the experiment, and a normalization and variance estimation procedure with linear mixed-effects models. We evaluate the framework using three comprehensive screens of Saccharomyces cerevisiae, which involve 4940 single-gene knock-out haploid mutants, 1127 single-gene knock-out diploid mutants and 5798 single-gene overexpression haploid strains. We show that the proposed approach (i) can be used in conjunction with practical experimental designs; (ii) allows extensions to alternative experimental workflows; (iii) enables a sensitive discovery of biologically meaningful changes; and (iv) strongly outperforms the existing noise reduction procedures.
All experimental datasets are publicly available at www.ionomicshub.org. The R package HTSmix is available at http://www.stat.purdue.edu/~ovitek/HTSmix.html.
Supplementary data are available at Bioinformatics online.
高通量扰动筛选在各种条件下测量数千个生物样本的表型。筛选中测量的表型受到大量的生物和技术变化的影响。同时,为了实现高通量,通常不可能包含大量的重复,并在整个筛选过程中随机化它们的顺序。在这种实验设计中,区分表型的真实变化和随机变化是极具挑战性的,需要充分的统计方法。
我们提出了一个基于实验设计的统计建模框架,该框架至少包括两个在整个实验过程中进行分析的对照,以及一个使用线性混合效应模型进行归一化和方差估计的过程。我们使用三个全面的酿酒酵母筛选实验来评估该框架,这三个实验涉及 4940 个单基因敲除的单倍体突变体、1127 个单基因敲除的二倍体突变体和 5798 个单基因过表达的单倍体菌株。我们表明,所提出的方法(i)可以与实际的实验设计结合使用;(ii)允许扩展到替代的实验工作流程;(iii)能够灵敏地发现有生物学意义的变化;(iv)大大优于现有的降噪程序。
所有的实验数据集都可以在 www.ionomicshub.org 上公开获取。R 包 HTSmix 可以在 http://www.stat.purdue.edu/~ovitek/HTSmix.html 上获取。
补充数据可在《生物信息学》在线获取。