Department of Cell & Systems Biology, University of Toronto, Toronto, Ontario, Canada.
Centre for the Analysis of Genome Evolution & Function, University of Toronto, Toronto, Ontario, Canada.
PLoS Comput Biol. 2022 Feb 22;18(2):e1009899. doi: 10.1371/journal.pcbi.1009899. eCollection 2022 Feb.
A critical step in studying biological features (e.g., genetic variants, gene families, metabolic capabilities, or taxa) is assessing their diversity and distribution among a sample of individuals. Accurate assessments of these patterns are essential for linking features to traits or outcomes of interest and understanding their functional impact. Consequently, it is of crucial importance that the measures employed for quantifying feature diversity can perform robustly under any evolutionary scenario. However, the standard measures used for quantifying and comparing the distribution of features, such as prevalence, phylogenetic diversity, and related approaches, either do not take into consideration evolutionary history, or assume strictly vertical patterns of inheritance. Consequently, these approaches cannot accurately assess diversity for features that have undergone recombination or horizontal transfer. To address this issue, we have devised RecPD, a novel recombination-aware phylogenetic-diversity statistic for measuring the distribution and diversity of features under all evolutionary scenarios. RecPD utilizes ancestral-state reconstruction to map the presence / absence of features onto ancestral nodes in a species tree, and then identifies potential recombination events in the evolutionary history of the feature. We also derive several related measures from RecPD that can be used to assess and quantify evolutionary dynamics and correlation of feature evolutionary histories. We used simulation studies to show that RecPD reliably reconstructs feature evolutionary histories under diverse recombination and loss scenarios. We then applied RecPD in two diverse real-world scenarios including a preliminary study type III effector protein families secreted by the plant pathogenic bacterium Pseudomonas syringae and growth phenotypes of the Pseudomonas genus and demonstrate that prevalence is an inadequate measure that obscures the potential impact of recombination. We believe RecPD will have broad utility for revealing and quantifying complex evolutionary processes for features at any biological level.
研究生物特征(例如遗传变异、基因家族、代谢能力或分类群)的一个关键步骤是评估它们在个体样本中的多样性和分布。准确评估这些模式对于将特征与感兴趣的特征或结果联系起来以及理解其功能影响至关重要。因此,用于量化特征多样性的度量标准在任何进化场景下都能稳健地运行是至关重要的。然而,用于量化和比较特征分布的标准度量标准,如流行率、系统发育多样性和相关方法,要么没有考虑进化历史,要么假设严格的垂直遗传模式。因此,这些方法无法准确评估经历了重组或水平转移的特征的多样性。为了解决这个问题,我们设计了 RecPD,这是一种新的重组感知系统发育多样性统计量,用于在所有进化场景下测量特征的分布和多样性。RecPD 利用祖先状态重建将特征的存在/缺失映射到物种树上的祖先节点,并然后在特征的进化历史中识别潜在的重组事件。我们还从 RecPD 中推导出几个相关的度量标准,可用于评估和量化特征进化历史的进化动态和相关性。我们使用模拟研究表明,RecPD 在各种重组和缺失场景下可靠地重建了特征进化历史。然后,我们将 RecPD 应用于两个不同的真实世界场景,包括植物病原细菌丁香假单胞菌分泌的 III 型效应蛋白家族和假单胞菌属的生长表型的初步研究,并表明流行率是一种不充分的度量标准,它掩盖了重组的潜在影响。我们相信 RecPD 将具有广泛的用途,可以揭示和量化任何生物学水平的特征的复杂进化过程。