Suppr超能文献

在分析微阵列数据时,我们能多精确地控制错误发现率?

How accurately can we control the FDR in analyzing microarray data?

作者信息

Jung Sin-Ho, Jang Woncheol

机构信息

Department of Biostatistics and Bioinformatics, Duke University, NC 27710, USA.

出版信息

Bioinformatics. 2006 Jul 15;22(14):1730-6. doi: 10.1093/bioinformatics/btl161. Epub 2006 Apr 27.

Abstract

We want to evaluate the performance of two FDR-based multiple testing procedures by Benjamini and Hochberg (1995, J. R. Stat. Soc. Ser. B, 57, 289-300) and Storey (2002, J. R. Stat. Soc. Ser. B, 64, 479-498) in analyzing real microarray data. These procedures commonly require independence or weak dependence of the test statistics. However, expression levels of different genes from each array are usually correlated due to coexpressing genes and various sources of errors from experiment-specific and subject-specific conditions that are not adjusted for in data analysis. Because of high dimensionality of microarray data, it is usually impossible to check whether the weak dependence condition is met for a given dataset or not. We propose to generate a large number of test statistics from a simulation model which has asymptotically (in terms of the number of arrays) the same correlation structure as the test statistics that will be calculated from the given data and to investigate how accurately the FDR-based testing procedures control the FDR on the simulated data. Our approach is to directly check the performance of these procedures for a given dataset, rather than to check the weak dependency requirement. We illustrate the proposed method with real microarray datasets, one where the clinical endpoint is disease group and another where it is survival.

摘要

我们希望评估由本雅明尼和霍赫贝格(1995年,《皇家统计学会会刊》B辑,第57卷,第289 - 300页)以及斯托里(2002年,《皇家统计学会会刊》B辑,第64卷,第479 - 498页)提出的两种基于错误发现率(FDR)的多重检验程序在分析实际微阵列数据时的性能。这些程序通常要求检验统计量具有独立性或弱相关性。然而,由于共表达基因以及在数据分析中未针对特定实验条件和特定受试者条件进行调整的各种误差来源,每个阵列中不同基因的表达水平通常是相关的。由于微阵列数据的高维度性,通常无法检查给定数据集是否满足弱相关性条件。我们建议从一个模拟模型生成大量检验统计量,该模拟模型在渐近意义上(就阵列数量而言)与将从给定数据计算出的检验统计量具有相同的相关结构,并研究基于FDR的检验程序在模拟数据上对FDR的控制精度。我们的方法是直接检查这些程序在给定数据集上的性能,而不是检查弱相关性要求。我们用实际微阵列数据集说明了所提出的方法,一个数据集的临床终点是疾病组,另一个数据集的临床终点是生存期。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验