全基因组测序时代克隆群体的系统发育理解

Phylogenetic understanding of clonal populations in an era of whole genome sequencing.

作者信息

Pearson Talima, Okinaka Richard T, Foster Jeffrey T, Keim Paul

机构信息

Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, AZ, USA.

出版信息

Infect Genet Evol. 2009 Sep;9(5):1010-9. doi: 10.1016/j.meegid.2009.05.014. Epub 2009 May 27.

DOI:10.1016/j.meegid.2009.05.014

PMID:19477301

Abstract

Phylogenetic hypotheses using whole genome sequences have the potential for unprecedented accuracy, yet a failure to understand issues associated with discovery bias, character sampling, and strain sampling can lead to highly erroneous conclusions. For microbial pathogens, phylogenies derived from whole genome sequences are becoming more common, as large numbers of characters distributed across entire genomes can yield extremely accurate phylogenies, particularly for strictly clonal populations. The availability of whole genomes is increasing as new sequencing technologies reduce the cost and time required for genome sequencing. Until entire sample collections can be fully sequenced, harnessing the phylogenetic power from whole genome sequences in more than a small subset of fully sequenced strains requires the integration of whole genome and partial genome genotyping data. Such integration involves discovering evolutionarily stable polymorphic characters by whole genome comparisons, then determining allelic states across a wide panel of isolates using high-throughput genotyping technologies. Here, we demonstrate how such an approach using single nucleotide polymorphisms (SNPs) yields highly accurate, but biased phylogenetic reconstructions and how the accuracy of the resulting tree is compromised by incomplete taxon and character sampling. Despite recent phylogenetic work detailing the strengths and biases of integrating whole genome and partial genome genotype data, these issues are relatively new and remain poorly understood by many researchers. Here, we revisit these biases and provide strategies for maximizing phylogenetic accuracy. Although we write this review with bacterial pathogens in mind, these concepts apply to any clonally reproducing population or indeed to any evolutionarily stable marker that is inherited in a strictly clonal manner. Understanding the ways in which current and emerging technologies can be used to maximize phylogenetic knowledge is advantageous only with a complete understanding of the strengths and weaknesses of these methods.

摘要

使用全基因组序列构建的系统发育假说有可能达到前所未有的准确性，然而，若未能理解与发现偏差、特征抽样和菌株抽样相关的问题，可能会导致得出极具错误性的结论。对于微生物病原体而言，源自全基因组序列的系统发育树正变得越来越普遍，因为分布于整个基因组的大量特征能够产生极其准确的系统发育树，尤其是对于严格克隆的群体。随着新测序技术降低了基因组测序所需的成本和时间，全基因组的可得性正在增加。在能够对整个样本集合进行全测序之前，要在超过一小部分已完全测序菌株中利用全基因组序列的系统发育能力，就需要整合全基因组和部分基因组的基因分型数据。这种整合包括通过全基因组比较发现进化上稳定的多态性特征，然后使用高通量基因分型技术确定广泛分离株中的等位基因状态。在此，我们展示了这种使用单核苷酸多态性（SNP）的方法如何产生高度准确但有偏差的系统发育重建，以及所得树的准确性如何因不完整的分类群和特征抽样而受到损害。尽管最近的系统发育研究详细阐述了整合全基因组和部分基因组基因型数据的优势和偏差，但这些问题相对较新，许多研究人员对此仍了解不足。在此，我们重新审视这些偏差，并提供使系统发育准确性最大化的策略。尽管我们撰写本综述时主要考虑的是细菌病原体，但这些概念适用于任何进行克隆繁殖的群体，实际上也适用于任何以严格克隆方式遗传的进化上稳定的标记。只有在全面了解这些方法的优缺点的情况下，理解如何利用现有和新兴技术来最大化系统发育知识才是有益的。

相似文献

Phylogenetic understanding of clonal populations in an era of whole genome sequencing.

Infect Genet Evol. 2009 Sep;9(5):1010-9. doi: 10.1016/j.meegid.2009.05.014. Epub 2009 May 27.

Genome-based phylogenetic analysis of Streptomyces and its relatives.

Mol Phylogenet Evol. 2010 Mar;54(3):763-72. doi: 10.1016/j.ympev.2009.11.019. Epub 2009 Dec 3.

Discrimination and phylogenomic classification of Bacillus anthracis-cereus-thuringiensis strains based on LC-MS/MS analysis of whole cell protein digests.

Anal Chem. 2010 Jan 1;82(1):145-55. doi: 10.1021/ac9015648.

Strain-specific single-nucleotide polymorphism assays for the Bacillus anthracis Ames strain.

J Clin Microbiol. 2007 Jan;45(1):47-53. doi: 10.1128/JCM.01233-06. Epub 2006 Nov 8.

Making the most of mitochondrial genomes--markers for phylogeny, molecular ecology and barcodes in Schistosoma (Platyhelminthes: Digenea).

Int J Parasitol. 2007 Oct;37(12):1401-18. doi: 10.1016/j.ijpara.2007.04.014. Epub 2007 May 10.

Whole-genome prokaryotic phylogeny.

Bioinformatics. 2005 May 15;21(10):2329-35. doi: 10.1093/bioinformatics/bth324. Epub 2004 May 27.

Whole Genome Analysis of Injectional Anthrax Identifies Two Disease Clusters Spanning More Than 13 Years.

EBioMedicine. 2015 Oct 6;2(11):1613-8. doi: 10.1016/j.ebiom.2015.10.004. eCollection 2015 Nov.

BPhyOG: an interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes.

BMC Bioinformatics. 2007 Jul 25;8:266. doi: 10.1186/1471-2105-8-266.

A novel feature-based method for whole genome phylogenetic analysis without alignment: application to HEV genotyping and subtyping.

Biochem Biophys Res Commun. 2008 Apr 4;368(2):223-30. doi: 10.1016/j.bbrc.2008.01.070. Epub 2008 Jan 28.

Animal phylogenomics: multiple interspecific genome comparisons.

Methods Enzymol. 2005;395:104-33. doi: 10.1016/S0076-6879(05)95008-8.

引用本文的文献

Population sequencing for phylogenetic diversity and transmission analyses.

Proc Natl Acad Sci U S A. 2025 Jun 10;122(23):e2424797122. doi: 10.1073/pnas.2424797122. Epub 2025 Jun 3.

Optimizing microbiome reference databases with PacBio full-length 16S rRNA sequencing for enhanced taxonomic classification and biomarker discovery.

Front Microbiol. 2024 Nov 25;15:1485073. doi: 10.3389/fmicb.2024.1485073. eCollection 2024.

Whole-genome sequencing-based analysis of Brucella species isolated from ruminants in various regions of Türki̇ye.

BMC Infect Dis. 2024 Oct 30;24(1):1220. doi: 10.1186/s12879-024-09921-w.

Population sequencing for diversity and transmission analyses.

bioRxiv. 2024 Jun 20:2024.06.18.599478. doi: 10.1101/2024.06.18.599478.

A Guide to Phylogenomic Inference.

Methods Mol Biol. 2024;2802:267-345. doi: 10.1007/978-1-0716-3838-5_11.

Whole-Genome sequencing in routine epidemiology - scoping the potential.

Microb Genom. 2024 Feb;10(2). doi: 10.1099/mgen.0.001185.

Global phylogenomic diversity of : spread of a dominant lineage.

Front Microbiol. 2023 Nov 29;14:1287046. doi: 10.3389/fmicb.2023.1287046. eCollection 2023.

Geo-epidemiology of animal tuberculosis and genotypes in livestock in a small, high-incidence area in Sicily, Italy.

Front Microbiol. 2023 Mar 17;14:1107396. doi: 10.3389/fmicb.2023.1107396. eCollection 2023.

Whole Genome Sequencing Refines Knowledge on the Population Structure of from a Multi-Host Tuberculosis System.

Microorganisms. 2021 Jul 26;9(8):1585. doi: 10.3390/microorganisms9081585.

Pathogen to commensal? Longitudinal within-host population dynamics, evolution, and adaptation during a chronic >16-year Burkholderia pseudomallei infection.

PLoS Pathog. 2020 Mar 5;16(3):e1008298. doi: 10.1371/journal.ppat.1008298. eCollection 2020 Mar.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

全基因组测序时代克隆群体的系统发育理解

Phylogenetic understanding of clonal populations in an era of whole genome sequencing.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献