Mollah Mohammad Manir Hossain, Jamal Rahman, Mokhtar Norfilza Mohd, Harun Roslan, Mollah Md Nurul Haque
Institut Perubatan Molekul UKM (UMBI), University Kebangsaan Malaysia (UKM), Jalan Ya'acob Latiff, Bandar Tun Razak, Cheras 56000 Kuala Lumpur, Malaysia.
Institut Perubatan Molekul UKM (UMBI), University Kebangsaan Malaysia (UKM), Jalan Ya'acob Latiff, Bandar Tun Razak, Cheras 56000 Kuala Lumpur, Malaysia; Department of Physiology, Faculty of Medicine, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia.
PLoS One. 2015 Sep 28;10(9):e0138810. doi: 10.1371/journal.pone.0138810. eCollection 2015.
Identifying genes that are differentially expressed (DE) between two or more conditions with multiple patterns of expression is one of the primary objectives of gene expression data analysis. Several statistical approaches, including one-way analysis of variance (ANOVA), are used to identify DE genes. However, most of these methods provide misleading results for two or more conditions with multiple patterns of expression in the presence of outlying genes. In this paper, an attempt is made to develop a hybrid one-way ANOVA approach that unifies the robustness and efficiency of estimation using the minimum β-divergence method to overcome some problems that arise in the existing robust methods for both small- and large-sample cases with multiple patterns of expression.
The proposed method relies on a β-weight function, which produces values between 0 and 1. The β-weight function with β = 0.2 is used as a measure of outlier detection. It assigns smaller weights (≥ 0) to outlying expressions and larger weights (≤ 1) to typical expressions. The distribution of the β-weights is used to calculate the cut-off point, which is compared to the observed β-weight of an expression to determine whether that gene expression is an outlier. This weight function plays a key role in unifying the robustness and efficiency of estimation in one-way ANOVA.
Analyses of simulated gene expression profiles revealed that all eight methods (ANOVA, SAM, LIMMA, EBarrays, eLNN, KW, robust BetaEB and proposed) perform almost identically for m = 2 conditions in the absence of outliers. However, the robust BetaEB method and the proposed method exhibited considerably better performance than the other six methods in the presence of outliers. In this case, the BetaEB method exhibited slightly better performance than the proposed method for the small-sample cases, but the the proposed method exhibited much better performance than the BetaEB method for both the small- and large-sample cases in the presence of more than 50% outlying genes. The proposed method also exhibited better performance than the other methods for m > 2 conditions with multiple patterns of expression, where the BetaEB was not extended for this condition. Therefore, the proposed approach would be more suitable and reliable on average for the identification of DE genes between two or more conditions with multiple patterns of expression.
识别在两种或更多具有多种表达模式的条件之间差异表达(DE)的基因是基因表达数据分析的主要目标之一。包括单向方差分析(ANOVA)在内的几种统计方法被用于识别差异表达基因。然而,在存在异常基因的情况下,对于两种或更多具有多种表达模式的条件,这些方法中的大多数都会给出误导性结果。本文尝试开发一种混合单向方差分析方法,该方法使用最小β-散度法统一估计的稳健性和效率,以克服现有稳健方法在具有多种表达模式的小样本和大样本情况下出现的一些问题。
所提出的方法依赖于一个β权重函数,该函数产生介于0和1之间的值。β = 0.2的β权重函数用作异常值检测的度量。它为异常表达分配较小的权重(≥0),为典型表达分配较大的权重(≤1)。β权重的分布用于计算截止点,将其与一个表达的观察到的β权重进行比较,以确定该基因表达是否为异常值。这个权重函数在统一单向方差分析中估计的稳健性和效率方面起着关键作用。
对模拟基因表达谱的分析表明,在没有异常值的情况下,对于m = 2种条件,所有八种方法(方差分析、SAM、LIMMA、EBarrays、eLNN、KW、稳健的BetaEB和所提出的方法)的表现几乎相同。然而,在存在异常值的情况下,稳健的BetaEB方法和所提出的方法表现出比其他六种方法明显更好的性能。在这种情况下,对于小样本情况,BetaEB方法表现略优于所提出的方法,但在存在超过50%异常基因的情况下,对于小样本和大样本情况,所提出的方法表现比BetaEB方法好得多。对于m > 2种具有多种表达模式的条件,所提出的方法也表现出比其他方法更好的性能,而BetaEB方法未针对这种情况进行扩展。因此,平均而言,所提出的方法对于识别两种或更多具有多种表达模式的条件之间的差异表达基因将更合适且可靠。