Gusnanto Arief, Calza Stefano, Pawitan Yudi
Medical Research Council - Biostatistics Unit, Institute of Public Health, Cambridge, UK.
Curr Opin Lipidol. 2007 Apr;18(2):187-93. doi: 10.1097/MOL.0b013e3280895d6f.
To highlight the development in microarray data analysis for the identification of differentially expressed genes, particularly via control of false discovery rate.
The emergence of high-throughput technology such as microarrays raises two fundamental statistical issues: multiplicity and sensitivity. We focus on the biological problem of identifying differentially expressed genes. First, multiplicity arises due to testing tens of thousands of hypotheses, rendering the standard P value meaningless. Second, known optimal single-test procedures such as the t-test perform poorly in the context of highly multiple tests. The standard approach of dealing with multiplicity is too conservative in the microarray context. The false discovery rate concept is fast becoming the key statistical assessment tool replacing the P value. We review the false discovery rate approach and argue that it is more sensible for microarray data. We also discuss some methods to take into account additional information from the microarrays to improve the false discovery rate.
There is growing consensus on how to analyse microarray data using the false discovery rate framework in place of the classical P value. Further research is needed on the preprocessing of the raw data, such as the normalization step and filtering, and on finding the most sensitive test procedure.
重点介绍微阵列数据分析在鉴定差异表达基因方面的进展,尤其是通过控制错误发现率来实现。
诸如微阵列等高通量技术的出现引发了两个基本的统计学问题:多重性和敏感性。我们关注鉴定差异表达基因的生物学问题。首先,由于要检验数以万计的假设,多重性问题随之产生,使得标准P值变得毫无意义。其次,诸如t检验等已知的最优单检验程序在高度多重检验的情况下表现不佳。在微阵列背景下,处理多重性的标准方法过于保守。错误发现率概念正迅速成为取代P值的关键统计评估工具。我们回顾了错误发现率方法,并认为它对于微阵列数据更为合理。我们还讨论了一些考虑微阵列额外信息以提高错误发现率的方法。
对于如何使用错误发现率框架而非经典P值来分析微阵列数据,人们的共识日益增加。在原始数据的预处理方面,如标准化步骤和过滤,以及寻找最敏感的检验程序方面,仍需要进一步研究。