Scripps Research Translational Institute, Scripps Research Institute, La Jolla, United States.
Department of Integrative Structural and Computational Biology, Scripps Research, La Jolla, United States.
Elife. 2022 Sep 23;11:e75600. doi: 10.7554/eLife.75600.
Genotype imputation is a foundational tool for population genetics. Standard statistical imputation approaches rely on the co-location of large whole-genome sequencing-based reference panels, powerful computing environments, and potentially sensitive genetic study data. This results in computational resource and privacy-risk barriers to access to cutting-edge imputation techniques. Moreover, the accuracy of current statistical approaches is known to degrade in regions of low and complex linkage disequilibrium. Artificial neural network-based imputation approaches may overcome these limitations by encoding complex genotype relationships in easily portable inference models. Here, we demonstrate an autoencoder-based approach for genotype imputation, using a large, commonly used reference panel, and spanning the entirety of human chromosome 22. Our autoencoder-based genotype imputation strategy achieved superior imputation accuracy across the allele-frequency spectrum and across genomes of diverse ancestry, while delivering at least fourfold faster inference run time relative to standard imputation tools.
基因型推断是群体遗传学的基础工具。标准的统计推断方法依赖于大型全基因组测序参考面板的共定位、强大的计算环境和潜在敏感的遗传研究数据。这导致了获取先进推断技术的计算资源和隐私风险障碍。此外,当前统计方法的准确性已知在低和复杂连锁不平衡区域会降低。基于人工神经网络的推断方法可以通过在易于移植的推理模型中编码复杂的基因型关系来克服这些限制。在这里,我们展示了一种基于自动编码器的基因型推断方法,该方法使用了一个大型的、常用的参考面板,跨越了人类 22 号染色体的全部。我们的基于自动编码器的基因型推断策略在整个等位基因频率谱和不同祖先的基因组中实现了更高的推断准确性,同时相对于标准推断工具,推断运行时间至少快了四倍。