Suppr超能文献

基于置换的尾部 FDR 置信区间的计算效率估计。

Computationally efficient permutation-based confidence interval estimation for tail-area FDR.

机构信息

Division of Biostatistics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California Los Angeles, CA, USA.

出版信息

Front Genet. 2013 Sep 17;4:179. doi: 10.3389/fgene.2013.00179. eCollection 2013.

Abstract

Challenges of satisfying parametric assumptions in genomic settings with thousands or millions of tests have led investigators to combine powerful False Discovery Rate (FDR) approaches with computationally expensive but exact permutation testing. We describe a computationally efficient permutation-based approach that includes a tractable estimator of the proportion of true null hypotheses, the variance of the log of tail-area FDR, and a confidence interval (CI) estimator, which accounts for the number of permutations conducted and dependencies between tests. The CI estimator applies a binomial distribution and an overdispersion parameter to counts of positive tests. The approach is general with regards to the distribution of the test statistic, it performs favorably in comparison to other approaches, and reliable FDR estimates are demonstrated with as few as 10 permutations. An application of this approach to relate sleep patterns to gene expression patterns in mouse hypothalamus yielded a set of 11 transcripts associated with 24 h REM sleep [FDR = 0.15 (0.08, 0.26)]. Two of the corresponding genes, Sfrp1 and Sfrp4, are involved in wnt signaling and several others, Irf7, Ifit1, Iigp2, and Ifih1, have links to interferon signaling. These genes would have been overlooked had a typical a priori FDR threshold such as 0.05 or 0.1 been applied. The CI provides the flexibility for choosing a significance threshold based on tolerance for false discoveries and precision of the FDR estimate. That is, it frees the investigator to use a more data-driven approach to define significance, such as the minimum estimated FDR, an option that is especially useful for weak effects, often observed in studies of complex diseases.

摘要

在基因组学环境中,满足参数假设的挑战导致研究人员将强大的错误发现率(FDR)方法与计算成本高昂但精确的置换检验相结合。我们描述了一种基于置换的计算高效方法,该方法包括一个可处理的真零假设比例估计值、尾部区域 FDR 的对数方差的估计值以及置信区间(CI)估计值,该估计值考虑了进行的置换次数和测试之间的依赖性。CI 估计值应用二项分布和过离散参数来对阳性测试的计数进行计数。该方法对于检验统计量的分布具有一般性,与其他方法相比表现良好,并且仅使用 10 次置换即可获得可靠的 FDR 估计值。将这种方法应用于将睡眠模式与小鼠下丘脑的基因表达模式相关联,得出了一组与 24 小时 REM 睡眠相关的 11 个转录物 [FDR = 0.15(0.08,0.26)]。对应的两个基因 Sfrp1 和 Sfrp4 参与 wnt 信号转导,其他几个基因 Irf7、Ifit1、Iigp2 和 Ifih1 与干扰素信号转导有关。如果应用典型的先验 FDR 阈值(例如 0.05 或 0.1),则这些基因将被忽略。CI 提供了根据假发现容忍度和 FDR 估计精度选择显著阈值的灵活性。也就是说,它使研究人员能够使用更具数据驱动的方法来定义意义,例如最小估计 FDR,这对于弱效应尤其有用,弱效应通常在复杂疾病的研究中观察到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/861b/3775454/4848728d1423/fgene-04-00179-g0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验