Chan Jeffrey, Perrone Valerio, Spence Jeffrey P, Jenkins Paul A, Mathieson Sara, Song Yun S
University of California, Berkeley.
University of Warwick.
Adv Neural Inf Process Syst. 2018 Dec;31:8594-8605.
An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential challenges need to be addressed: (1) population data are exchangeable, calling for methods that efficiently exploit the symmetries of the data, and (2) computing likelihoods is intractable as it requires integrating over a set of correlated, extremely high-dimensional latent variables. These challenges are traditionally tackled by likelihood-free methods that use scientific simulators to generate datasets and reduce them to hand-designed, permutation-invariant summary statistics, often leading to inaccurate inference. In this work, we develop an exchangeable neural network that performs summary statistic-free, likelihood-free inference. Our framework can be applied in a black-box fashion across a variety of simulation-based tasks, both within and outside biology. We demonstrate the power of our approach on the recombination hotspot testing problem, outperforming the state-of-the-art.
在过去十年中,高通量DNA测序技术的迅猛发展引发了人们对利用全基因组数据进行群体规模推断的浓厚兴趣。群体遗传学领域的近期工作主要集中在为相对简单的模型类别设计推断方法,而对于更现实、更复杂的模型,几乎不存在可扩展的通用推断技术。要实现这一点,需要解决两个推断挑战:(1)群体数据具有可交换性,这就需要能够有效利用数据对称性的方法;(2)计算似然性非常棘手,因为这需要对一组相关的、极高维的潜在变量进行积分。传统上,这些挑战是通过无似然方法来解决的,这些方法使用科学模拟器生成数据集,并将其简化为手工设计的、置换不变的汇总统计量,这往往会导致不准确的推断。在这项工作中,我们开发了一种可交换神经网络,它可以进行无汇总统计量、无似然的推断。我们的框架可以以黑箱方式应用于各种基于模拟的任务,包括生物学领域内外。我们在重组热点测试问题上展示了我们方法的强大威力,性能优于现有技术。