Azad Ariful, Rajwa Bartek, Pothen Alex
Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, Berkeley, 94720, CA, USA.
Bindley Bioscience Center, Purdue University, West Lafayette, 47907, IN, USA.
BMC Bioinformatics. 2016 Jul 28;17:291. doi: 10.1186/s12859-016-1083-9.
Comparing phenotypes of heterogeneous cell populations from multiple biological conditions is at the heart of scientific discovery based on flow cytometry (FC). When the biological signal is measured by the average expression of a biomarker, standard statistical methods require that variance be approximately stabilized in populations to be compared. Since the mean and variance of a cell population are often correlated in fluorescence-based FC measurements, a preprocessing step is needed to stabilize the within-population variances.
We present a variance-stabilization algorithm, called flowVS, that removes the mean-variance correlations from cell populations identified in each fluorescence channel. flowVS transforms each channel from all samples of a data set by the inverse hyperbolic sine (asinh) transformation. For each channel, the parameters of the transformation are optimally selected by Bartlett's likelihood-ratio test so that the populations attain homogeneous variances. The optimum parameters are then used to transform the corresponding channels in every sample. flowVS is therefore an explicit variance-stabilization method that stabilizes within-population variances in each channel by evaluating the homoskedasticity of clusters with a likelihood-ratio test. With two publicly available datasets, we show that flowVS removes the mean-variance dependence from raw FC data and makes the within-population variance relatively homogeneous. We demonstrate that alternative transformation techniques such as flowTrans, flowScape, logicle, and FCSTrans might not stabilize variance. Besides flow cytometry, flowVS can also be applied to stabilize variance in microarray data. With a publicly available data set we demonstrate that flowVS performs as well as the VSN software, a state-of-the-art approach developed for microarrays.
The homogeneity of variance in cell populations across FC samples is desirable when extracting features uniformly and comparing cell populations with different levels of marker expressions. The newly developed flowVS algorithm solves the variance-stabilization problem in FC and microarrays by optimally transforming data with the help of Bartlett's likelihood-ratio test. On two publicly available FC datasets, flowVS stabilizes within-population variances more evenly than the available transformation and normalization techniques. flowVS-based variance stabilization can help in performing comparison and alignment of phenotypically identical cell populations across different samples. flowVS and the datasets used in this paper are publicly available in Bioconductor.
基于流式细胞术(FC)的科学发现的核心是比较来自多种生物学条件的异质细胞群体的表型。当通过生物标志物的平均表达来测量生物信号时,标准统计方法要求在要比较的群体中方差近似稳定。由于在基于荧光的FC测量中细胞群体的均值和方差通常相关,因此需要一个预处理步骤来稳定群体内方差。
我们提出了一种称为flowVS的方差稳定算法,该算法可消除在每个荧光通道中识别的细胞群体的均值 - 方差相关性。flowVS通过反双曲正弦(asinh)变换对数据集中所有样本的每个通道进行变换。对于每个通道,通过巴特利特似然比检验最优地选择变换参数,以使群体达到齐次方差。然后使用最优参数对每个样本中的相应通道进行变换。因此,flowVS是一种显式的方差稳定方法,它通过似然比检验评估聚类的同方差性来稳定每个通道内的群体方差。使用两个公开可用的数据集,我们表明flowVS消除了原始FC数据中的均值 - 方差依赖性,并使群体内方差相对齐次。我们证明,诸如flowTrans、flowScape、logicle和FCSTrans等替代变换技术可能无法稳定方差。除了流式细胞术,flowVS还可应用于稳定微阵列数据中的方差。使用一个公开可用的数据集,我们证明flowVS的性能与VSN软件相当,VSN软件是一种为微阵列开发的先进方法。
在均匀提取特征并比较具有不同标记表达水平的细胞群体时,FC样本中细胞群体方差的齐次性是理想的。新开发的flowVS算法通过在巴特利特似然比检验的帮助下最优地变换数据,解决了FC和微阵列中的方差稳定问题。在两个公开可用的FC数据集上,flowVS比现有的变换和归一化技术更均匀地稳定群体内方差。基于flowVS的方差稳定有助于对不同样本中表型相同的细胞群体进行比较和比对。flowVS以及本文中使用的数据集可在Bioconductor中公开获取。