Konishi Tomokazu
Faculty of Bioresource Sciences, Akita Prefectural University, Akita 010-0195, Japan.
BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S6. doi: 10.1186/1752-0509-5-S2-S6. Epub 2011 Dec 14.
Microarray technology has enabled the measurement of comprehensive transcriptomic information. However, each data entry may reflect trivial individual differences among samples and also contain technical noise. Therefore, the certainty of each observed difference should be confirmed at earlier steps of the analyses, and statistical tests are frequently used for this purpose. Since microarrays analyze a huge number of genes simultaneously, concerns of multiplicity, i.e. the family wise error rate (FWER) and false discovery rate (FDR), have been raised in testing the data. To deal with these concerns, several compensation methodologies have been proposed, making the tests very conservative to the extent that arbitrary tuning of the threshold has been introduced to relax the conditions. Unexpectedly, however, the appropriateness of the test methodologies, the concerns of multiplicity, and the compensation methodologies have not been sufficiently confirmed.
The appropriateness was checked by means of coincidence between the methodologies' premises and the statistical characteristics of data found in two typical microarray platforms. As expected, normality was observed in within-group data differences, supporting application of t-test and F-test statistics. However, genes displayed their own tendencies in the magnitude of variations, and the distributions of p-values were rather complex. These characteristics are inconsistent with premises underlying the compensation methodologies, which assume that most of the null hypotheses are true. The evidence also raised concerns about multiplicity. In transcriptomic studies, FWER should not be critical, as analyses at higher levels would not be influenced by a few false positives. Additionally, the concerns for FDR are not suitable for the sharp null hypotheses on expression levels.
Therefore, although compensation methods have been recommended to deal with the problem of multiplicity, the compensations are actually inappropriate for transcriptome analyses. Compensations are not only unnecessary, but will increase the occurrence of false negative errors, and arbitrary adjustment of the threshold damages the objectivity of the tests. Rather, the results of parametric tests should be evaluated directly.
微阵列技术能够测量全面的转录组信息。然而,每个数据条目可能反映样本之间微不足道的个体差异,并且还包含技术噪声。因此,在分析的早期步骤中应确认每个观察到的差异的确定性,为此经常使用统计检验。由于微阵列同时分析大量基因,在测试数据时出现了多重性问题,即家族性错误率(FWER)和错误发现率(FDR)。为了解决这些问题,已经提出了几种补偿方法,使得测试非常保守,以至于引入了阈值的任意调整以放宽条件。然而,出乎意料的是,测试方法的适用性、多重性问题和补偿方法尚未得到充分证实。
通过方法前提与在两个典型微阵列平台中发现的数据的统计特征之间的一致性来检查适用性。正如预期的那样,在组内数据差异中观察到正态性,支持t检验和F检验统计的应用。然而,基因在变异程度上表现出自身的趋势,并且p值的分布相当复杂。这些特征与补偿方法所依据的前提不一致,后者假设大多数零假设是真的。该证据也引发了对多重性的担忧。在转录组研究中,FWER不应是关键问题,因为更高层次的分析不会受到一些假阳性的影响。此外,对FDR的担忧不适用于关于表达水平的尖锐零假设。
因此,尽管已推荐使用补偿方法来处理多重性问题,但这些补偿实际上不适用于转录组分析。补偿不仅不必要,而且会增加假阴性错误的发生率,并且阈值的任意调整会损害测试的客观性。相反,应直接评估参数检验的结果。