Datta Jyotishka, Dunson David B
Department of Mathematical Sciences, University of Arkansas, Fayetteville, Arkansas 72701, U.S.A.
Department of Statistical Science, Duke University, Durham, North Carolina 27708,
Biometrika. 2016 Dec;103(4):971-983. doi: 10.1093/biomet/asw053. Epub 2016 Dec 8.
There is growing interest in analysing high-dimensional count data, which often exhibit quasi-sparsity corresponding to an overabundance of zeros and small nonzero counts. Existing methods for analysing multivariate count data via Poisson or negative binomial log-linear hierarchical models with zero-inflation cannot flexibly adapt to quasi-sparse settings. We develop a new class of continuous local-global shrinkage priors tailored to quasi-sparse counts. Theoretical properties are assessed, including flexible posterior concentration and stronger control of false discoveries in multiple testing. Simulation studies demonstrate excellent small-sample properties relative to competing methods. We use the method to detect rare mutational hotspots in exome sequencing data and to identify North American cities most impacted by terrorism.
对高维计数数据进行分析的兴趣与日俱增,这类数据常常呈现出准稀疏性,即存在大量的零值和少量的非零小计数。现有的通过带零膨胀的泊松或负二项对数线性分层模型分析多元计数数据的方法,无法灵活地适应准稀疏情形。我们开发了一类专门针对准稀疏计数的新型连续局部-全局收缩先验。对其理论性质进行了评估,包括灵活的后验集中性以及在多重检验中对错误发现的更强控制。模拟研究表明,相对于竞争方法,该方法具有出色的小样本性质。我们使用该方法在外显子组测序数据中检测罕见的突变热点,并识别受恐怖主义影响最严重的北美城市。