多重检验问题的马尔可夫链表示

A Markov chain representation of the multiple testing problem.

作者信息

Cabras Stefano

机构信息

Department of Statistics, Universidad Carlos III de Madrid, Spain; Department of Mathematics and Informatics, Università di Cagliari, Italy.

出版信息

Stat Methods Med Res. 2018 Feb;27(2):364-383. doi: 10.1177/0962280216628903. Epub 2016 Mar 16.

DOI:10.1177/0962280216628903

PMID:26984908

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5808946/

Abstract

The problem of multiple hypothesis testing can be represented as a Markov process where a new alternative hypothesis is accepted in accordance with its relative evidence to the currently accepted one. This virtual and not formally observed process provides the most probable set of non null hypotheses given the data; it plays the same role as Markov Chain Monte Carlo in approximating a posterior distribution. To apply this representation and obtain the posterior probabilities over all alternative hypotheses, it is enough to have, for each test, barely defined Bayes Factors, e.g. Bayes Factors obtained up to an unknown constant. Such Bayes Factors may either arise from using default and improper priors or from calibrating p-values with respect to their corresponding Bayes Factor lower bound. Both sources of evidence are used to form a Markov transition kernel on the space of hypotheses. The approach leads to easy interpretable results and involves very simple formulas suitable to analyze large datasets as those arising from gene expression data (microarray or RNA-seq experiments).

摘要

多重假设检验问题可表示为一个马尔可夫过程，其中新的备择假设根据其相对于当前接受假设的相对证据被接受。这个虚拟且未正式观察到的过程在给定数据的情况下提供了最有可能的非零假设集；它在近似后验分布方面与马尔可夫链蒙特卡罗起着相同的作用。为了应用这种表示并获得所有备择假设的后验概率，对于每个检验，只需定义基本的贝叶斯因子即可，例如，达到未知常数的贝叶斯因子。这种贝叶斯因子可能来自使用默认和不恰当的先验，或者通过相对于其相应的贝叶斯因子下限校准p值。这两种证据来源都用于在假设空间上形成马尔可夫转移核。该方法产生易于解释的结果，并且涉及非常简单的公式，适用于分析来自基因表达数据（微阵列或RNA测序实验）的大型数据集。