• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于快速基因组聚类的神经混合模型

Neural ADMIXTURE for rapid genomic clustering.

作者信息

Mantes Albert Dominguez, Montserrat Daniel Mas, Bustamante Carlos D, Giró-I-Nieto Xavier, Ioannidis Alexander G

机构信息

Department of Biomedical Data Science, Stanford Medical School, Stanford, CA, United States.

Signal Theory and Communications Department, Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain.

出版信息

Nat Comput Sci. 2023 Jul;3(7):621-629. doi: 10.1038/s43588-023-00482-7. Epub 2023 Jul 6.

DOI:10.1038/s43588-023-00482-7
PMID:37600116
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10438426/
Abstract

Characterizing the genetic structure of large cohorts has become increasingly important as genetic studies extend to massive, increasingly diverse biobanks. Popular methods decompose individual genomes into fractional cluster assignments with each cluster representing a vector of DNA variant frequencies. However, with rapidly increasing biobank sizes, these methods have become computationally intractable. Here we present Neural ADMIXTURE, a neural network autoencoder that follows the same modeling assumptions as the current standard algorithm, ADMIXTURE, while reducing the compute time by orders of magnitude surpassing even the fastest alternatives. One month of continuous compute using ADMIXTURE can be reduced to just hours with Neural ADMIXTURE. A multi-head approach allows Neural ADMIXTURE to offer even further acceleration by calculating multiple cluster numbers in a single run. Furthermore, the models can be stored, allowing cluster assignment to be performed on new data in linear time without needing to share the training samples.

摘要

随着基因研究扩展到大规模、日益多样化的生物样本库,描绘大型队列的基因结构变得越来越重要。流行的方法将个体基因组分解为分数聚类分配,每个聚类代表一个DNA变异频率向量。然而,随着生物样本库规模的迅速增加,这些方法在计算上变得难以处理。在这里,我们提出了神经混合模型(Neural ADMIXTURE),这是一种神经网络自动编码器,它遵循与当前标准算法混合模型(ADMIXTURE)相同的建模假设,同时将计算时间减少了几个数量级,甚至超过了最快的替代方法。使用混合模型(ADMIXTURE)进行一个月的连续计算,使用神经混合模型(Neural ADMIXTURE)可以减少到仅几个小时。多头方法允许神经混合模型(Neural ADMIXTURE)通过在一次运行中计算多个聚类数来提供进一步的加速。此外,可以存储模型,从而能够在线性时间内对新数据进行聚类分配,而无需共享训练样本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/766a0cfe6aba/nihms-1917269-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/cd238bca4fcd/nihms-1917269-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/00aa7ab67c0f/nihms-1917269-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/d14c6874d173/nihms-1917269-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/5645d5398aca/nihms-1917269-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/5a017f192dfc/nihms-1917269-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/4b1b22067338/nihms-1917269-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/60443baad9b8/nihms-1917269-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/45b55e66484e/nihms-1917269-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/766a0cfe6aba/nihms-1917269-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/cd238bca4fcd/nihms-1917269-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/00aa7ab67c0f/nihms-1917269-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/d14c6874d173/nihms-1917269-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/5645d5398aca/nihms-1917269-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/5a017f192dfc/nihms-1917269-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/4b1b22067338/nihms-1917269-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/60443baad9b8/nihms-1917269-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/45b55e66484e/nihms-1917269-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c029/10438426/766a0cfe6aba/nihms-1917269-f0004.jpg

相似文献

1
Neural ADMIXTURE for rapid genomic clustering.用于快速基因组聚类的神经混合模型
Nat Comput Sci. 2023 Jul;3(7):621-629. doi: 10.1038/s43588-023-00482-7. Epub 2023 Jul 6.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Archetypal Analysis for population genetics.群体遗传学的原型分析。
PLoS Comput Biol. 2022 Aug 25;18(8):e1010301. doi: 10.1371/journal.pcbi.1010301. eCollection 2022 Aug.
4
Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets.无监督发现生物库规模数据集的祖先信息标记和遗传混合比例。
Am J Hum Genet. 2023 Feb 2;110(2):314-325. doi: 10.1016/j.ajhg.2022.12.008. Epub 2023 Jan 6.
5
SHIPS: Spectral Hierarchical clustering for the Inference of Population Structure in genetic studies.SHIPS:遗传研究中用于推断群体结构的谱层次聚类。
PLoS One. 2012;7(10):e45685. doi: 10.1371/journal.pone.0045685. Epub 2012 Oct 12.
6
Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.基于自动编码器的单细胞 RNA-seq 数据分析聚类集成。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.
7
Enhancements to the ADMIXTURE algorithm for individual ancestry estimation.ADMIXTURE 算法在个体血统估计中的改进。
BMC Bioinformatics. 2011 Jun 18;12:246. doi: 10.1186/1471-2105-12-246.
8
ADMIXPIPE: population analyses in ADMIXTURE for non-model organisms.ADMIXPIPE:非模式生物在 ADMIXTURE 中的群体分析。
BMC Bioinformatics. 2020 Jul 29;21(1):337. doi: 10.1186/s12859-020-03701-4.
9
Complex genetic admixture histories reconstructed with Approximate Bayesian Computation.利用近似贝叶斯计算重建复杂的遗传混合历史。
Mol Ecol Resour. 2021 May;21(4):1098-1117. doi: 10.1111/1755-0998.13325. Epub 2021 Feb 26.
10
Analysis of pedigree data in populations with multiple ancestries: Strategies for dealing with admixture in Caribbean Hispanic families from the ADSP.多血统人群系谱数据的分析:来自ADSP的加勒比西班牙裔家庭中处理混合血统的策略
Genet Epidemiol. 2018 Sep;42(6):500-515. doi: 10.1002/gepi.22133. Epub 2018 Jun 3.

引用本文的文献

1
Revealing the range of equally likely estimates in the admixture model.揭示混合模型中等可能性估计值的范围。
G3 (Bethesda). 2025 Aug 6;15(8). doi: 10.1093/g3journal/jkaf142.
2
Sparse haplotype-based fine-scale local ancestry inference at scale reveals recent selection on immune responses.大规模基于稀疏单倍型的精细尺度本地祖先推断揭示了对免疫反应的近期选择。
Nat Commun. 2025 Mar 20;16(1):2742. doi: 10.1038/s41467-025-57601-3.
3
Deep learning insights into distinct patterns of polygenic adaptation across human populations.深度学习对人类群体中多基因适应性的不同模式的见解。

本文引用的文献

1
The specious art of single-cell genomics.单细胞基因组学的似是而非的艺术。
PLoS Comput Biol. 2023 Aug 17;19(8):e1011288. doi: 10.1371/journal.pcbi.1011288. eCollection 2023 Aug.
2
SALAI-Net: species-agnostic local ancestry inference network.SALAI-Net:无物种特异性的局部亲缘关系推断网络。
Bioinformatics. 2022 Sep 16;38(Suppl_2):ii27-ii33. doi: 10.1093/bioinformatics/btac464.
3
Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated.基于主成分分析(PCA)的群体遗传学研究结果存在高度偏差,必须重新评估。
Nucleic Acids Res. 2024 Dec 11;52(22):e102. doi: 10.1093/nar/gkae1027.
4
Inferring ancestry with the hierarchical soft clustering approach tangleGen.使用分层软聚类方法tangleGen推断血统。
Genome Res. 2024 Dec 23;34(12):2244-2255. doi: 10.1101/gr.279399.124.
5
A machine learning-based predictive model of causality in orthopaedic medical malpractice cases in China.基于机器学习的中国骨科医疗纠纷因果关系预测模型。
PLoS One. 2024 Apr 17;19(4):e0300662. doi: 10.1371/journal.pone.0300662. eCollection 2024.
6
A genotyping array for the globally invasive vector mosquito, Aedes albopictus.一种用于全球入侵性蚊子白纹伊蚊的基因分型阵列。
Parasit Vectors. 2024 Mar 4;17(1):106. doi: 10.1186/s13071-024-06158-z.
7
Machine Learning Strategies for Improved Phenotype Prediction in Underrepresented Populations.机器学习策略在代表性不足人群中的表型预测改善。
Pac Symp Biocomput. 2024;29:404-418.
8
Machine Learning Strategies for Improved Phenotype Prediction in Underrepresented Populations.改善代表性不足人群表型预测的机器学习策略
bioRxiv. 2023 Oct 17:2023.10.12.561949. doi: 10.1101/2023.10.12.561949.
9
Harnessing deep learning for population genetic inference.利用深度学习进行群体遗传推断。
Nat Rev Genet. 2024 Jan;25(1):61-78. doi: 10.1038/s41576-023-00636-3. Epub 2023 Sep 4.
10
Deep Learning in Population Genetics.群体遗传学中的深度学习。
Genome Biol Evol. 2023 Feb 3;15(2). doi: 10.1093/gbe/evad008.
Sci Rep. 2022 Aug 29;12(1):14683. doi: 10.1038/s41598-022-14395-4.
4
Archetypal Analysis for population genetics.群体遗传学的原型分析。
PLoS Comput Biol. 2022 Aug 25;18(8):e1010301. doi: 10.1371/journal.pcbi.1010301. eCollection 2022 Aug.
5
Haplotype and population structure inference using neural networks in whole-genome sequencing data.使用全基因组测序数据中的神经网络进行单倍型和群体结构推断。
Genome Res. 2022 Aug 25;32(8):1542-1552. doi: 10.1101/gr.276813.122.
6
Using the UK Biobank as a global reference of worldwide populations: application to measuring ancestry diversity from GWAS summary statistics.利用英国生物库作为全球人群的全球参考:从 GWAS 汇总统计数据衡量祖先多样性的应用。
Bioinformatics. 2022 Jun 27;38(13):3477-3480. doi: 10.1093/bioinformatics/btac348.
7
Inferring population structure in biobank-scale genomic data.推断生物库规模基因组数据中的群体结构。
Am J Hum Genet. 2022 Apr 7;109(4):727-737. doi: 10.1016/j.ajhg.2022.02.015. Epub 2022 Mar 16.
8
A deep learning framework for characterization of genotype data.深度学习框架用于基因型数据的特征描述。
G3 (Bethesda). 2022 Mar 4;12(3). doi: 10.1093/g3journal/jkac020.
9
Learning Extremal Representations with Deep Archetypal Analysis.通过深度原型分析学习极值表示。
Int J Comput Vis. 2021;129(4):805-820. doi: 10.1007/s11263-020-01390-3. Epub 2020 Dec 23.
10
Visualizing population structure with variational autoencoders.使用变分自动编码器进行人口结构可视化。
G3 (Bethesda). 2021 Jan 18;11(1). doi: 10.1093/g3journal/jkaa036.