Airoldi Edoardo M, Costa Thiago, Bassetti Federico, Leisen Fabrizio, Guindani Michele
Department of Statistics at Harvard University and an Alfred P. Sloan Research Fellow.
School of Engineering and Applied Sciences at Harvard University.
J Am Stat Assoc. 2014 Dec 1;109(508):1466-1480. doi: 10.1080/01621459.2014.950735.
Many popular Bayesian nonparametric priors can be characterized in terms of exchangeable species sampling sequences. However, in some applications, exchangeability may not be appropriate. We introduce a novel and probabilistically coherent family of non-exchangeable species sampling sequences characterized by a tractable predictive probability function with weights driven by a sequence of independent Beta random variables. We compare their theoretical clustering properties with those of the Dirichlet Process and the two parameters Poisson-Dirichlet process. The proposed construction provides a complete characterization of the joint process, differently from existing work. We then propose the use of such process as prior distribution in a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte Carlo sampler for posterior inference. We evaluate the performance of the prior and the robustness of the resulting inference in a simulation study, providing a comparison with popular Dirichlet Processes mixtures and Hidden Markov Models. Finally, we develop an application to the detection of chromosomal aberrations in breast cancer by leveraging array CGH data.
许多流行的贝叶斯非参数先验可以根据可交换物种抽样序列来表征。然而,在某些应用中,可交换性可能并不合适。我们引入了一个新颖的、概率上连贯的不可交换物种抽样序列族,其特征在于具有由独立贝塔随机变量序列驱动权重的易于处理的预测概率函数。我们将它们的理论聚类特性与狄利克雷过程和两参数泊松 - 狄利克雷过程的特性进行比较。与现有工作不同,所提出的构造提供了联合过程的完整表征。然后,我们建议在分层贝叶斯建模框架中使用这种过程作为先验分布,并描述了一种用于后验推断的马尔可夫链蒙特卡罗采样器。我们在模拟研究中评估了先验的性能以及所得推断的稳健性,并与流行的狄利克雷过程混合模型和隐马尔可夫模型进行了比较。最后,我们通过利用阵列比较基因组杂交(array CGH)数据开发了一个用于检测乳腺癌染色体畸变的应用。