Department of Data Sciences and Operations, University of Southern California, Los Angeles, California, USA.
Department of Statistics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
Biometrics. 2021 Sep;77(3):1037-1049. doi: 10.1111/biom.13422. Epub 2021 Jan 27.
Changepoint detection methods are used in many areas of science and engineering, for example, in the analysis of copy number variation data to detect abnormalities in copy numbers along the genome. Despite the broad array of available tools, methodology for quantifying our uncertainty in the strength (or the presence) of given changepoints post-selection are lacking. Post-selection inference offers a framework to fill this gap, but the most straightforward application of these methods results in low-powered hypothesis tests and leaves open several important questions about practical usability. In this work, we carefully tailor post-selection inference methods toward changepoint detection, focusing on copy number variation data. To accomplish this, we study commonly used changepoint algorithms: binary segmentation, as well as two of its most popular variants, wild and circular, and the fused lasso. We implement some of the latest developments in post-selection inference theory, mainly auxiliary randomization. This improves the power, which requires implementations of Markov chain Monte Carlo algorithms (importance sampling and hit-and-run sampling) to carry out our tests. We also provide recommendations for improving practical useability, detailed simulations, and example analyses on array comparative genomic hybridization as well as sequencing data.
变点检测方法被广泛应用于科学和工程的多个领域,例如,在分析拷贝数变异数据时,用于检测基因组上的拷贝数异常。尽管有许多可用的工具,但在选择后量化给定变点的强度(或存在)的不确定性的方法仍然缺乏。选择后推断为填补这一空白提供了一个框架,但这些方法最直接的应用导致了假设检验的低功效,并留下了关于实际可用性的几个重要问题。在这项工作中,我们针对变点检测仔细调整了选择后推断方法,重点关注拷贝数变异数据。为了实现这一目标,我们研究了常用的变点算法:二进制分割,以及它的两个最流行的变体:野生和循环,以及融合套索。我们实现了选择后推断理论的一些最新进展,主要是辅助随机化。这提高了功效,这需要实现马尔可夫链蒙特卡罗算法(重要性抽样和命中和运行抽样)来进行我们的测试。我们还提供了关于提高实际可用性、详细模拟以及在阵列比较基因组杂交和测序数据上的示例分析的建议。