快速分层贝叶斯群体结构分析。

Fast hierarchical Bayesian analysis of population structure.

机构信息

Parasites and Microbes, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK.

Department of Microbiology, New York University School of Medicine, NY 10016, USA.

出版信息

Nucleic Acids Res. 2019 Jun 20;47(11):5539-5549. doi: 10.1093/nar/gkz361.

DOI:10.1093/nar/gkz361

PMID:31076776

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6582336/

Abstract

We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet process mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analyzing an alignment of over 110 000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximize the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and subclades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.

摘要

我们提出了 fastbaps，这是一种解决遗传聚类问题的快速方法。Fastbaps 可以快速确定对Dirichlet 过程混合模型（DPM）的近似拟合，以对多基因座基因型数据进行聚类。我们的高效基于模型的聚类方法能够对现有基于模型的方法聚类 10-100 倍以上的数据集，我们通过分析超过 110,000 个 HIV-1 pol 基因序列的对齐来证明这一点。我们还提供了一种快速划分现有层次结构的方法，以最大化 DPM 模型边际似然，从而允许我们使用群体基因组模型将系统发育树划分为分支和亚分支。对模拟数据以及各种真实细菌和病毒数据集的广泛测试表明，fastbaps 为以前的基于模型的方法提供了可比或改进的解决方案，同时速度也显著提高。该方法以 MIT 许可证的形式免费提供，作为一个易于使用的 R 包，可在 https://github.com/gtonkinhill/fastbaps 上获得。

相似文献

Fast hierarchical Bayesian analysis of population structure.快速分层贝叶斯群体结构分析。

Nucleic Acids Res. 2019 Jun 20;47(11):5539-5549. doi: 10.1093/nar/gkz361.

Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics.狄利克雷过程混合模型中用于变量选择的快速近似推断及其在泛癌蛋白质组学中的应用

Stat Appl Genet Mol Biol. 2019 Dec 12;18(6):/j/sagmb.2019.18.issue-6/sagmb-2018-0065/sagmb-2018-0065.xml. doi: 10.1515/sagmb-2018-0065.

Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation.基于相对分支长度差异和模型违背情况下蛋白质序列数据的贝叶斯和最大似然系统发育分析。

BMC Evol Biol. 2005 Jan 28;5:8. doi: 10.1186/1471-2148-5-8.

Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm.基于随机算法加速时间序列数据的贝叶斯层次聚类。

PLoS One. 2013;8(4):e59795. doi: 10.1371/journal.pone.0059795. Epub 2013 Apr 2.

Modeling and visualizing uncertainty in gene expression clusters using dirichlet process mixtures.使用狄利克雷过程混合模型对基因表达聚类中的不确定性进行建模和可视化。

IEEE/ACM Trans Comput Biol Bioinform. 2009 Oct-Dec;6(4):615-28. doi: 10.1109/TCBB.2007.70269.

Bayesian model-based clustering of temporal gene expression using autoregressive panel data approach.基于自回归面板数据方法的时间基因表达的贝叶斯模型聚类。

Bioinformatics. 2012 Aug 1;28(15):2004-7. doi: 10.1093/bioinformatics/bts322. Epub 2012 Jun 4.

I-SVVS: integrative stochastic variational variable selection to explore joint patterns of multi-omics microbiome data.I-SVVS：整合随机变分变量选择以探索多组学微生物组数据的联合模式

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf132.

On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

PsiPartition: Improved Site Partitioning for Genomic Data by Parameterized Sorting Indices and Bayesian Optimization.PsiPartition：通过参数化排序索引和贝叶斯优化改进基因组数据的位点划分

J Mol Evol. 2024 Dec;92(6):874-890. doi: 10.1007/s00239-024-10215-7. Epub 2024 Dec 5.

Kpax3: Bayesian bi-clustering of large sequence datasets.Kpax3：大型序列数据集的贝叶斯双聚类。

Bioinformatics. 2018 Jun 15;34(12):2132-2133. doi: 10.1093/bioinformatics/bty056.

引用本文的文献

Genome analysis of in Norway, 2016-2023, reveals shifting epidemiology in the wake of the COVID-19 pandemic.2016 - 2023年挪威的基因组分析揭示了新冠疫情后流行病学的变化。

Microb Genom. 2025 Sep;11(9). doi: 10.1099/mgen.0.001479.

Emergence of a carbapenem-resistant atypical uropathogenic Escherichia coli clone as an increasing cause of urinary tract infection.一种对碳青霉烯类耐药的非典型尿路致病性大肠杆菌克隆株的出现成为尿路感染日益常见的病因。

Nat Commun. 2025 Sep 2;16(1):8200. doi: 10.1038/s41467-025-63477-0.

Global trends of antimicrobial resistance and virulence of Klebsiella pneumoniae from different host sources.不同宿主来源肺炎克雷伯菌的抗菌药物耐药性及毒力的全球趋势

Commun Med (Lond). 2025 Sep 1;5(1):383. doi: 10.1038/s43856-025-01112-1.

Comparative Genomic Analysis of : Insights into Its Genetic Diversity, Metabolic Function, and Antibiotic Resistance.《的比较基因组分析：对其遗传多样性、代谢功能和抗生素抗性的洞察》（原文标题不完整，推测此处应补充相关研究对象，仅根据现有内容直译）

Genes (Basel). 2025 Jul 24;16(8):869. doi: 10.3390/genes16080869.

Wave succession in the pandemic clone of Vibrio parahaemolyticus driven by gene loss.基因缺失驱动的副溶血性弧菌大流行克隆中的波状演替

Nat Ecol Evol. 2025 Aug 27. doi: 10.1038/s41559-025-02827-z.

Genomic analysis of the 2017 Aotearoa New Zealand outbreak of and its position within the global population structure.2017年新西兰奥特亚罗瓦（Aotearoa）疫情的基因组分析及其在全球种群结构中的位置。

Front Microbiol. 2025 Jul 23;16:1600146. doi: 10.3389/fmicb.2025.1600146. eCollection 2025.

How to measure bacterial genome plasticity? A novel index helps gather insights on pathogens.如何测量细菌基因组可塑性？一种新指标有助于深入了解病原体。

Microb Genom. 2025 Aug;11(8). doi: 10.1099/mgen.0.001459.

Global genomic epidemiology and plasmid-mediated dissemination of and in the complex.全球基因组流行病学以及blaCTX-M和blaNDM在碳青霉烯类耐药肠杆菌科细菌中的质粒介导传播

Curr Res Microb Sci. 2025 Jul 4;9:100436. doi: 10.1016/j.crmicr.2025.100436. eCollection 2025.

ChiVariARIBA: a modular, editable workflow and database for characterising chitin gene variation in spp. and related bacteria.ChiVariARIBA：一种用于鉴定 spp. 及相关细菌中几丁质基因变异的模块化、可编辑工作流程和数据库。

Microb Genom. 2025 Jul;11(7). doi: 10.1099/mgen.0.001439.

Evidence for circulation of high-virulence HIV-1 subtype B variants in the United Kingdom.英国存在高毒力HIV-1 B亚型变体传播的证据。

Virus Evol. 2025 May 20;11(1):veaf048. doi: 10.1093/ve/veaf048. eCollection 2025.

本文引用的文献

Stat Appl Genet Mol Biol. 2019 Dec 12;18(6):/j/sagmb.2019.18.issue-6/sagmb-2018-0065/sagmb-2018-0065.xml. doi: 10.1515/sagmb-2018-0065.

Fast and flexible bacterial genomic epidemiology with PopPUNK.使用 PopPUNK 进行快速灵活的细菌基因组流行病学研究。

Genome Res. 2019 Feb;29(2):304-316. doi: 10.1101/gr.241455.118. Epub 2019 Jan 24.

Dimensionality reduction for visualizing single-cell data using UMAP.使用UMAP进行单细胞数据可视化的降维方法。

Nat Biotechnol. 2018 Dec 3. doi: 10.1038/nbt.4314.

RhierBAPS: An R implementation of the population clustering algorithm hierBAPS.RhierBAPS：群体聚类算法hierBAPS的R语言实现。

Wellcome Open Res. 2018 Jul 30;3:93. doi: 10.12688/wellcomeopenres.14694.1. eCollection 2018.

A fast likelihood solution to the genetic clustering problem.一种针对基因聚类问题的快速似然解。

Methods Ecol Evol. 2018 Apr;9(4):1006-1016. doi: 10.1111/2041-210X.12968. Epub 2018 Jan 30.

Virus genomes reveal factors that spread and sustained the Ebola epidemic.病毒基因组揭示了埃博拉疫情传播和持续的因素。

Nature. 2017 Apr 20;544(7650):309-315. doi: 10.1038/nature22040. Epub 2017 Apr 12.

Large scale genomic analysis shows no evidence for pathogen adaptation between the blood and cerebrospinal fluid niches during bacterial meningitis.大规模基因组分析并未显示在细菌性脑膜炎期间血液和脑脊液生态位之间存在病原体适应性的证据。

Microb Genom. 2017 Jan 31;3(1):e000103. doi: 10.1099/mgen.0.000103. eCollection 2017 Jan.

Benzalkonium tolerance genes and outcome in Listeria monocytogenes meningitis.单核细胞增生李斯特菌脑膜炎中的苯扎氯铵耐受基因与转归

Clin Microbiol Infect. 2017 Apr;23(4):265.e1-265.e7. doi: 10.1016/j.cmi.2016.12.008. Epub 2016 Dec 18.

Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes.序列元件富集分析确定细菌表型的遗传基础。

Nat Commun. 2016 Sep 16;7:12797. doi: 10.1038/ncomms12797.

Identifying lineage effects when controlling for population structure improves power in bacterial association studies.在控制群体结构时识别谱系效应可提高细菌关联研究的效能。

Nat Microbiol. 2016 Apr 4;1:16041. doi: 10.1038/nmicrobiol.2016.41.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

快速分层贝叶斯群体结构分析。

Fast hierarchical Bayesian analysis of population structure.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献