一种使用可交换神经网络的群体遗传数据无似然推断框架。

A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks.

作者信息

Chan Jeffrey, Perrone Valerio, Spence Jeffrey P, Jenkins Paul A, Mathieson Sara, Song Yun S

机构信息

University of California, Berkeley.

University of Warwick.

出版信息

Adv Neural Inf Process Syst. 2018 Dec;31:8594-8605.

PMID:33244210

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7687905/

Abstract

An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential challenges need to be addressed: (1) population data are exchangeable, calling for methods that efficiently exploit the symmetries of the data, and (2) computing likelihoods is intractable as it requires integrating over a set of correlated, extremely high-dimensional latent variables. These challenges are traditionally tackled by likelihood-free methods that use scientific simulators to generate datasets and reduce them to hand-designed, permutation-invariant summary statistics, often leading to inaccurate inference. In this work, we develop an exchangeable neural network that performs summary statistic-free, likelihood-free inference. Our framework can be applied in a black-box fashion across a variety of simulation-based tasks, both within and outside biology. We demonstrate the power of our approach on the recombination hotspot testing problem, outperforming the state-of-the-art.

摘要

在过去十年中，高通量DNA测序技术的迅猛发展引发了人们对利用全基因组数据进行群体规模推断的浓厚兴趣。群体遗传学领域的近期工作主要集中在为相对简单的模型类别设计推断方法，而对于更现实、更复杂的模型，几乎不存在可扩展的通用推断技术。要实现这一点，需要解决两个推断挑战：（1）群体数据具有可交换性，这就需要能够有效利用数据对称性的方法；（2）计算似然性非常棘手，因为这需要对一组相关的、极高维的潜在变量进行积分。传统上，这些挑战是通过无似然方法来解决的，这些方法使用科学模拟器生成数据集，并将其简化为手工设计的、置换不变的汇总统计量，这往往会导致不准确的推断。在这项工作中，我们开发了一种可交换神经网络，它可以进行无汇总统计量、无似然的推断。我们的框架可以以黑箱方式应用于各种基于模拟的任务，包括生物学领域内外。我们在重组热点测试问题上展示了我们方法的强大威力，性能优于现有技术。

相似文献

A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks.

Adv Neural Inf Process Syst. 2018 Dec;31:8594-8605.

A practical guide to pseudo-marginal methods for computational inference in systems biology.

J Theor Biol. 2020 Jul 7;496:110255. doi: 10.1016/j.jtbi.2020.110255. Epub 2020 Mar 26.

Inferring neural circuit structure from datasets of heterogeneous tuning curves.

PLoS Comput Biol. 2019 Apr 19;15(4):e1006816. doi: 10.1371/journal.pcbi.1006816. eCollection 2019 Apr.

Likelihood-free inference via classification.

Stat Comput. 2018;28(2):411-425. doi: 10.1007/s11222-017-9738-6. Epub 2017 Mar 13.

Mining gold from implicit models to improve likelihood-free inference.

Proc Natl Acad Sci U S A. 2020 Mar 10;117(10):5242-5249. doi: 10.1073/pnas.1915980117. Epub 2020 Feb 20.

Universal inference.

Proc Natl Acad Sci U S A. 2020 Jul 21;117(29):16880-16890. doi: 10.1073/pnas.1922664117. Epub 2020 Jul 6.

A coalescent-based method for population tree inference with haplotypes.

Bioinformatics. 2015 Mar 1;31(5):691-8. doi: 10.1093/bioinformatics/btu710. Epub 2014 Oct 24.

Efficient Strategies for Calculating Blockwise Likelihoods Under the Coalescent.

Genetics. 2016 Feb;202(2):775-86. doi: 10.1534/genetics.115.183814. Epub 2015 Dec 29.

Using likelihood-free inference to compare evolutionary dynamics of the protein networks of H. pylori and P. falciparum.

PLoS Comput Biol. 2007 Nov;3(11):e230. doi: 10.1371/journal.pcbi.0030230. Epub 2007 Oct 9.

引用本文的文献

Signatures of soft selective sweeps predominate in the yellow fever mosquito .

bioRxiv. 2025 Jul 10:2025.07.06.663360. doi: 10.1101/2025.07.06.663360.

Genomic Anomaly Detection with Functional Data Analysis.

Genes (Basel). 2025 Jun 15;16(6):710. doi: 10.3390/genes16060710.

Constructing ancestral recombination graphs through reinforcement learning.

Front Genet. 2025 Apr 29;16:1569358. doi: 10.3389/fgene.2025.1569358. eCollection 2025.

Efficient Detection and Characterization of Targets of Natural Selection Using Transfer Learning.

Mol Biol Evol. 2025 Apr 30;42(5). doi: 10.1093/molbev/msaf094.

Accounting for contact network uncertainty in epidemic inferences with Approximate Bayesian Computation.

Appl Netw Sci. 2025;10(1):13. doi: 10.1007/s41109-025-00694-y. Epub 2025 Apr 22.

ConfuseNN: Interpreting convolutional neural network inferences in population genomics with data shuffling.

bioRxiv. 2025 Mar 27:2025.03.24.644668. doi: 10.1101/2025.03.24.644668.

Efficient detection and characterization of targets of natural selection using transfer learning.

bioRxiv. 2025 Mar 6:2025.03.05.641710. doi: 10.1101/2025.03.05.641710.

Fast and accurate deep learning scans for signatures of natural selection in genomes using FASTER-NN.

Commun Biol. 2025 Jan 15;8(1):58. doi: 10.1038/s42003-025-07480-7.

Digital Image Processing to Detect Adaptive Evolution.

Mol Biol Evol. 2024 Dec 6;41(12). doi: 10.1093/molbev/msae242.

Population size rescaling significantly biases outcomes of forward-in-time population genetic simulations.

Genetics. 2025 Jan 8;229(1):1-57. doi: 10.1093/genetics/iyae180.

本文引用的文献

The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference.

Mol Biol Evol. 2019 Feb 1;36(2):220-238. doi: 10.1093/molbev/msy224.

Detecting Recombination Hotspots from Patterns of Linkage Disequilibrium.

G3 (Bethesda). 2016 Aug 9;6(8):2265-71. doi: 10.1534/g3.116.029587.

Two-Locus Likelihoods Under Variable Population Size and Fine-Scale Recombination Rate Estimation.

Genetics. 2016 Jul;203(3):1381-99. doi: 10.1534/genetics.115.184820. Epub 2016 May 10.

Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes.

PLoS Comput Biol. 2016 May 4;12(5):e1004842. doi: 10.1371/journal.pcbi.1004842. eCollection 2016 May.

Deep Learning for Population Genetic Inference.

PLoS Comput Biol. 2016 Mar 28;12(3):e1004845. doi: 10.1371/journal.pcbi.1004845. eCollection 2016 Mar.

Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach.

PLoS Genet. 2016 Mar 4;12(3):e1005877. doi: 10.1371/journal.pgen.1005877. eCollection 2016 Mar.

Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain.

Genome Biol Evol. 2015 Nov 19;7(12):3511-28. doi: 10.1093/gbe/evv228.

Searching for footprints of positive selection in whole-genome SNP data from nonequilibrium populations.

Genetics. 2010 Jul;185(3):907-22. doi: 10.1534/genetics.110.116459. Epub 2010 Apr 20.

Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood.

Genetics. 2009 Aug;182(4):1207-18. doi: 10.1534/genetics.109.102509. Epub 2009 Jun 8.

Population genomic inference of recombination rates and hotspots.

Proc Natl Acad Sci U S A. 2009 Apr 14;106(15):6215-9. doi: 10.1073/pnas.0900418106. Epub 2009 Apr 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种使用可交换神经网络的群体遗传数据无似然推断框架。

A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献