Dobriban Edgar
Department of Statistics, The Wharton School, University of Pennsylania, USA.
Inf inference. 2018 Jun;7(2):251-275. doi: 10.1093/imaiai/iax013. Epub 2017 Dec 8.
Researchers in data-rich disciplines-think of computational genomics and observational cosmology-often wish to mine large bodies of [Formula: see text]-values looking for significant effects, while controlling the false discovery rate or family-wise error rate. Increasingly, researchers also wish to prioritize certain hypotheses, for example, those thought to have larger effect sizes, by upweighting, and to impose constraints on the underlying mining, such as monotonicity along a certain sequence. We introduce , a principled method for performing weighted multiple testing by constrained convex optimization. Our method elegantly allows one to prioritize certain hypotheses through upweighting and to discount others through downweighting, while constraining the underlying weights involved in the mining process. When the [Formula: see text]-values derive from monotone likelihood ratio families such as the Gaussian means model, the new method allows exact solution of an important optimal weighting problem previously thought to be non-convex and computationally infeasible. Our method scales to massive data set sizes. We illustrate the applications of Princessp on a series of standard genomics data sets and offer comparisons with several previous 'standard' methods. Princessp offers both ease of operation and the ability to scale to extremely large problem sizes. The method is available as open-source software from github.com/dobriban/pvalue_weighting_matlab (accessed 11 October 2017).
数据丰富学科(如计算基因组学和观测宇宙学)的研究人员常常希望挖掘大量的P值,寻找显著效应,同时控制错误发现率或族系错误率。越来越多的研究人员还希望通过加权来优先考虑某些假设,例如那些被认为具有较大效应量的假设,并对基础挖掘施加约束,比如沿特定序列的单调性。我们引入了Princessp,一种通过约束凸优化进行加权多重检验的原则性方法。我们的方法巧妙地允许通过加权来优先考虑某些假设,并通过减权来淡化其他假设,同时约束挖掘过程中涉及的基础权重。当P值来自单调似然比族(如高斯均值模型)时,新方法能够精确求解一个以前被认为是非凸且计算上不可行的重要最优加权问题。我们的方法能够扩展到海量数据集规模。我们在一系列标准基因组学数据集上展示了Princessp的应用,并与之前的几种“标准”方法进行了比较。Princessp既易于操作,又具备扩展到极大问题规模的能力。该方法可从github.com/dobriban/pvalue_weighting_matlab获取开源软件(于2017年10月11日访问)。