Spielman Stephanie J, Weaver Steven, Shank Stephen D, Magalis Brittany Rife, Li Michael, Kosakovsky Pond Sergei L
Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.
Methods Mol Biol. 2019;1910:427-468. doi: 10.1007/978-1-4939-9074-0_14.
Natural selection is a fundamental force shaping organismal evolution, as it both maintains function and enables adaptation and innovation. Viruses, with their typically short and largely coding genomes, experience strong and diverse selective forces, sometimes acting on timescales that can be directly measured. These selection pressures emerge from an antagonistic interplay between rapidly changing fitness requirements (immune and antiviral responses from hosts, transmission between hosts, or colonization of new host species) and functional imperatives (the ability to infect hosts or host cells and replicate within hosts). Indeed, computational methods to quantify these evolutionary forces using molecular sequence data were initially, dating back to the 1980s, applied to the study of viral pathogens. This preference largely emerged because the strong selective forces are easiest to detect in viruses, and, of course, viruses have clear biomedical relevance. Recent commoditization of affordable high-throughput sequencing has made it possible to generate truly massive genomic data sets, on which powerful and accurate methods can yield a very detailed depiction of when, where, and (sometimes) how viral pathogens respond to various selective forces.Here, we present recent statistical developments and state-of-the-art methods to identify and characterize these selection pressures from protein-coding sequence alignments and phylogenies. Methods described here can reveal critical information about various evolutionary regimes, including whole-gene selection, lineage-specific selection, and site-specific selection acting upon viral genomes, while accounting for confounding biological processes, such as recombination and variation in mutation rates.
自然选择是塑造生物进化的一种基本力量,因为它既能维持功能,又能促成适应与创新。病毒具有典型的短且大多为编码序列的基因组,会经历强大且多样的选择压力,有时这些压力作用的时间尺度是可以直接测量的。这些选择压力源自快速变化的适应性需求(宿主的免疫和抗病毒反应、宿主间的传播或新宿主物种的定殖)与功能需求(感染宿主或宿主细胞并在宿主体内复制的能力)之间的对抗性相互作用。事实上,利用分子序列数据量化这些进化力量的计算方法最初可追溯到20世纪80年代,当时就被应用于病毒病原体的研究。这种偏好很大程度上是因为在病毒中最容易检测到强大的选择压力,当然,病毒也具有明确的生物医学相关性。近期,经济实惠的高通量测序技术的普及使得生成真正海量的基因组数据集成为可能,基于这些数据集,强大而准确的方法能够非常详细地描绘病毒病原体在何时、何地以及(有时)如何对各种选择压力做出反应。在此,我们介绍了近期的统计学进展以及最先进的方法,用于从蛋白质编码序列比对和系统发育关系中识别和表征这些选择压力。本文所述方法能够揭示有关各种进化模式的关键信息,包括作用于病毒基因组的全基因选择、谱系特异性选择和位点特异性选择,同时考虑到诸如重组和突变率变化等混杂的生物学过程。