Department of Biostatistics, JHU Bloomberg School of Public Health, Baltimore, MD, USA.
Bioinformatics. 2012 Mar 15;28(6):882-3. doi: 10.1093/bioinformatics/bts034. Epub 2012 Jan 17.
Heterogeneity and latent variables are now widely recognized as major sources of bias and variability in high-throughput experiments. The most well-known source of latent variation in genomic experiments are batch effects-when samples are processed on different days, in different groups or by different people. However, there are also a large number of other variables that may have a major impact on high-throughput measurements. Here we describe the sva package for identifying, estimating and removing unwanted sources of variation in high-throughput experiments. The sva package supports surrogate variable estimation with the sva function, direct adjustment for known batch effects with the ComBat function and adjustment for batch and latent variables in prediction problems with the fsva function.
异质性和潜在变量现在被广泛认为是高通量实验中偏倚和可变性的主要来源。基因组实验中潜在变化的最著名来源是批次效应——当样本在不同的日子、不同的组或由不同的人处理时。然而,还有大量其他变量可能对高通量测量产生重大影响。在这里,我们描述了 sva 包,用于识别、估计和去除高通量实验中的不必要变异源。sva 包支持使用 sva 函数进行替代变量估计,使用 ComBat 函数直接调整已知的批次效应,以及使用 fsva 函数在预测问题中调整批次和潜在变量。