在两组差异表达研究中对错误发现进行强大且可解释的控制。

Powerful and interpretable control of false discoveries in two-group differential expression studies.

作者信息

Enjalbert-Courrech Nicolas, Neuvial Pierre

机构信息

Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse, CNRS, UPS, F-31062 Toulouse Cedex 9, France.

出版信息

Bioinformatics. 2022 Nov 30;38(23):5214-5221. doi: 10.1093/bioinformatics/btac693.

DOI:10.1093/bioinformatics/btac693

PMID:36264124

Abstract

MOTIVATION

The standard approach for statistical inference in differential expression (DE) analyses is to control the false discovery rate (FDR). However, controlling the FDR does not in fact imply that the proportion of false discoveries is upper bounded. Moreover, no statistical guarantee can be given on subsets of genes selected by FDR thresholding. These known limitations are overcome by post hoc inference, which provides guarantees of the number of proportion of false discoveries among arbitrary gene selections. However, post hoc inference methods are not yet widely used for DE studies.

RESULTS

In this article, we demonstrate the relevance and illustrate the performance of adaptive interpolation-based post hoc methods for two-group DE studies. First, we formalize the use of permutation-based methods to obtain sharp confidence bounds that are adaptive to the dependence between genes. Then, we introduce a generic linear time algorithm for computing post hoc bounds, making these bounds applicable to large-scale two-group DE studies. The use of the resulting Adaptive Simes bound is illustrated on a RNA sequencing study. Comprehensive numerical experiments based on real microarray and RNA sequencing data demonstrate the statistical performance of the method.

AVAILABILITY AND IMPLEMENTATION

A cross-platform open source implementation within the R package sanssouci is available at https://sanssouci-org.github.io/sanssouci/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

差异表达（DE）分析中统计推断的标准方法是控制错误发现率（FDR）。然而，控制FDR实际上并不意味着错误发现的比例有上限。此外，对于通过FDR阈值选择的基因子集，无法给出统计保证。事后推断克服了这些已知的局限性，它能保证任意基因选择中错误发现的数量或比例。然而，事后推断方法尚未广泛用于DE研究。

结果

在本文中，我们展示了基于自适应插值的事后方法在两组DE研究中的相关性，并说明了其性能。首先，我们规范了基于置换的方法的使用，以获得适应基因间依赖性的精确置信区间。然后，我们引入了一种通用的线性时间算法来计算事后区间，使这些区间适用于大规模两组DE研究。在一项RNA测序研究中展示了所得自适应西姆斯区间的应用。基于真实微阵列和RNA测序数据的综合数值实验证明了该方法的统计性能。