• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

解读生成对抗网络以从遗传数据中推断自然选择

INTERPRETING GENERATIVE ADVERSARIAL NETWORKS TO INFER NATURAL SELECTION FROM GENETIC DATA.

作者信息

Riley Rebecca, Mathieson Iain, Mathieson Sara

机构信息

Department of Computer Science, Haverford College, Haverford PA, 19041 USA.

Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia PA, 19104 USA.

出版信息

bioRxiv. 2023 Jul 9:2023.03.07.531546. doi: 10.1101/2023.03.07.531546.

DOI:10.1101/2023.03.07.531546
PMID:36945387
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10028936/
Abstract

Understanding natural selection in humans and other species is a major focus for the use of machine learning in population genetics. Existing methods rely on computationally intensive simulated training data. Unlike efficient neutral coalescent simulations for demographic inference, realistic simulations of selection typically requires slow forward simulations. Because there are many possible modes of selection, a high dimensional parameter space must be explored, with no guarantee that the simulated models are close to the real processes. Mismatches between simulated training data and real test data can lead to incorrect inference. Finally, it is difficult to interpret trained neural networks, leading to a lack of understanding about what features contribute to classification. Here we develop a new approach to detect selection that requires relatively few selection simulations during training. We use a Generative Adversarial Network (GAN) trained to simulate realistic neutral data. The resulting GAN consists of a generator (fitted demographic model) and a discriminator (convolutional neural network). For a genomic region, the discriminator predicts whether it is "real" or "fake" in the sense that it could have been simulated by the generator. As the "real" training data includes regions that experienced selection and the generator cannot produce such regions, regions with a high probability of being real are likely to have experienced selection. To further incentivize this behavior, we "fine-tune" the discriminator with a small number of selection simulations. We show that this approach has high power to detect selection in simulations, and that it finds regions under selection identified by state-of-the art population genetic methods in three human populations. Finally, we show how to interpret the trained networks by clustering hidden units of the discriminator based on their correlation patterns with known summary statistics. In summary, our approach is a novel, efficient, and powerful way to use machine learning to detect natural selection.

摘要

理解人类和其他物种中的自然选择是机器学习在群体遗传学中应用的一个主要重点。现有方法依赖于计算密集型的模拟训练数据。与用于人口统计学推断的高效中性合并模拟不同,选择的现实模拟通常需要缓慢的正向模拟。由于存在许多可能的选择模式,必须探索高维参数空间,而且不能保证模拟模型接近真实过程。模拟训练数据与真实测试数据之间的不匹配可能导致错误推断。最后,难以解释经过训练的神经网络,导致对哪些特征有助于分类缺乏了解。在这里,我们开发了一种新的检测选择的方法,该方法在训练期间需要相对较少的选择模拟。我们使用经过训练以模拟现实中性数据的生成对抗网络(GAN)。生成的GAN由一个生成器(拟合的人口模型)和一个判别器(卷积神经网络)组成。对于一个基因组区域,判别器从它是否可以由生成器模拟的意义上预测它是“真实的”还是“虚假的”。由于“真实”训练数据包括经历过选择的区域,而生成器无法生成这样的区域,具有高概率为真实的区域很可能经历过选择。为了进一步促进这种行为,我们用少量的选择模拟对判别器进行“微调”。我们表明,这种方法在模拟中具有很高的检测选择的能力,并且它在三个人类群体中找到了由最先进的群体遗传方法识别出的选择区域。最后,我们展示了如何通过根据判别器的隐藏单元与已知汇总统计量的相关模式进行聚类来解释经过训练的网络。总之,我们的方法是一种新颖、高效且强大地利用机器学习来检测自然选择的方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/ce19ea707811/nihpp-2023.03.07.531546v2-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/23c9fc903a88/nihpp-2023.03.07.531546v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/8d0af24f68ca/nihpp-2023.03.07.531546v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/603de636d03c/nihpp-2023.03.07.531546v2-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/56dcef71cc1e/nihpp-2023.03.07.531546v2-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/7e4e8e950fd7/nihpp-2023.03.07.531546v2-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/c270ad752763/nihpp-2023.03.07.531546v2-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/ce19ea707811/nihpp-2023.03.07.531546v2-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/23c9fc903a88/nihpp-2023.03.07.531546v2-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/8d0af24f68ca/nihpp-2023.03.07.531546v2-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/603de636d03c/nihpp-2023.03.07.531546v2-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/56dcef71cc1e/nihpp-2023.03.07.531546v2-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/7e4e8e950fd7/nihpp-2023.03.07.531546v2-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/c270ad752763/nihpp-2023.03.07.531546v2-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fea1/10331864/ce19ea707811/nihpp-2023.03.07.531546v2-f0008.jpg

相似文献

1
INTERPRETING GENERATIVE ADVERSARIAL NETWORKS TO INFER NATURAL SELECTION FROM GENETIC DATA.解读生成对抗网络以从遗传数据中推断自然选择
bioRxiv. 2023 Jul 9:2023.03.07.531546. doi: 10.1101/2023.03.07.531546.
2
Interpreting generative adversarial networks to infer natural selection from genetic data.从遗传数据推断自然选择的生成对抗网络解释。
Genetics. 2024 Apr 3;226(4). doi: 10.1093/genetics/iyae024.
3
Enhancing classification of cells procured from bone marrow aspirate smears using generative adversarial networks and sequential convolutional neural network.利用生成对抗网络和序列卷积神经网络增强骨髓穿刺涂片获取的细胞分类。
Comput Methods Programs Biomed. 2022 Sep;224:107019. doi: 10.1016/j.cmpb.2022.107019. Epub 2022 Jul 10.
4
Automatic inference of demographic parameters using generative adversarial networks.使用生成对抗网络自动推断人口统计学参数。
Mol Ecol Resour. 2021 Nov;21(8):2689-2705. doi: 10.1111/1755-0998.13386. Epub 2021 May 3.
5
Generative adversarial network based synthetic data training model for lightweight convolutional neural networks.用于轻量级卷积神经网络的基于生成对抗网络的合成数据训练模型。
Multimed Tools Appl. 2023 May 20:1-23. doi: 10.1007/s11042-023-15747-6.
6
Quantum Generative Adversarial Learning.量子生成对抗学习。
Phys Rev Lett. 2018 Jul 27;121(4):040502. doi: 10.1103/PhysRevLett.121.040502.
7
Fast and accurate dose predictions for novel radiotherapy treatments in heterogeneous phantoms using conditional 3D-UNet generative adversarial networks.使用条件三维 U-Net 生成对抗网络对不均匀体模中的新型放射治疗进行快速准确的剂量预测。
Med Phys. 2022 May;49(5):3389-3404. doi: 10.1002/mp.15555. Epub 2022 Mar 3.
8
Deep Convolutional Generative Adversarial Network (dcGAN) Models for Screening and Design of Small Molecules Targeting Cannabinoid Receptors.用于筛选和设计大麻素受体小分子的深度卷积生成对抗网络 (dcGAN) 模型。
Mol Pharm. 2019 Nov 4;16(11):4451-4460. doi: 10.1021/acs.molpharmaceut.9b00500. Epub 2019 Oct 24.
9
LMGAN: Linguistically Informed Semi-Supervised GAN with Multiple Generators.LMGAN:具有多个生成器的语言信息半监督生成对抗网络
Sensors (Basel). 2022 Nov 13;22(22):8761. doi: 10.3390/s22228761.
10
CiwGAN and fiwGAN: Encoding information in acoustic data to model lexical learning with Generative Adversarial Networks.CiwGAN 和 fiwGAN:利用生成对抗网络将声学数据中的信息编码,以建模词汇学习。
Neural Netw. 2021 Jul;139:305-325. doi: 10.1016/j.neunet.2021.03.017. Epub 2021 Mar 19.

本文引用的文献

1
IntroUNET: Identifying introgressed alleles via semantic segmentation.IntroUNET:通过语义分割识别渐渗等位基因。
PLoS Genet. 2024 Feb 20;20(2):e1010657. doi: 10.1371/journal.pgen.1010657. eCollection 2024 Feb.
2
Domain-adaptive neural networks improve supervised machine learning based on simulated population genetic data.基于模拟群体遗传数据的域自适应神经网络提高监督机器学习性能。
PLoS Genet. 2023 Nov 7;19(11):e1011032. doi: 10.1371/journal.pgen.1011032. eCollection 2023 Nov.
3
This population does not exist: learning the distribution of evolutionary histories with generative adversarial networks.
这个群体并不存在:用生成对抗网络学习进化史的分布。
Genetics. 2023 May 26;224(2). doi: 10.1093/genetics/iyad063.
4
Dispersal inference from population genetic variation using a convolutional neural network.基于卷积神经网络的种群遗传变异离散推断。
Genetics. 2023 May 26;224(2). doi: 10.1093/genetics/iyad068.
5
Localizing Post-Admixture Adaptive Variants with Object Detection on Ancestry-Painted Chromosomes.利用基于祖先 painted 染色体的目标检测本地化混合后适应变体。
Mol Biol Evol. 2023 Apr 4;40(4). doi: 10.1093/molbev/msad074.
6
Deep Learning in Population Genetics.群体遗传学中的深度学习。
Genome Biol Evol. 2023 Feb 3;15(2). doi: 10.1093/gbe/evad008.
7
dnadna: a deep learning framework for population genetics inference.dnadna:一个用于群体遗传学推断的深度学习框架。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac765.
8
Deciphering signatures of natural selection via deep learning.通过深度学习破译自然选择的特征。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac354.
9
Neural networks for self-adjusting mutation rate estimation when the recombination rate is unknown.用于在未知重组率时自我调整突变率估计的神经网络。
PLoS Comput Biol. 2022 Aug 3;18(8):e1010407. doi: 10.1371/journal.pcbi.1010407. eCollection 2022 Aug.
10
Efficient ancestry and mutation simulation with msprime 1.0.利用 msprime 1.0 进行高效的祖先和突变模拟。
Genetics. 2022 Mar 3;220(3). doi: 10.1093/genetics/iyab229.