CINBIO (Biomedical Research Center), University of Vigo, 36310 Vigo, Spain.
Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbaa431.
The evolution of protein-coding genes is usually driven by selective processes, which favor some evolutionary trajectories over others, optimizing the subsequent protein stability and activity. The analysis of selection in this type of genetic data is broadly performed with the metric nonsynonymous/synonymous substitution rate ratio (dN/dS). However, most of the well-established methodologies to estimate this metric make crucial assumptions, such as lack of recombination or invariable codon frequencies along genes, which can bias the estimation. Here, we review the most relevant biases in the dN/dS estimation and provide a detailed guide to estimate this metric using state-of-the-art procedures that account for such biases, along with illustrative practical examples and recommendations. We also discuss the traditional interpretation of the estimated dN/dS emphasizing the importance of considering complementary biological information such as the role of the observed substitutions on the stability and function of proteins. This review is oriented to help evolutionary biologists that aim to accurately estimate selection in protein-coding sequences.
蛋白质编码基因的进化通常是由选择过程驱动的,这些过程有利于某些进化轨迹而不是其他轨迹,从而优化了随后的蛋白质稳定性和活性。这种类型的遗传数据中的选择分析通常使用非同义/同义替换率比值(dN/dS)进行。然而,大多数用于估计该指标的成熟方法都做出了关键假设,例如缺乏重组或基因中不变的密码子频率,这可能会导致估计偏差。在这里,我们回顾了 dN/dS 估计中的最相关偏差,并提供了详细的指南,使用最先进的程序来估计该指标,这些程序考虑了这些偏差,以及说明性的实际示例和建议。我们还讨论了对估计的 dN/dS 的传统解释,强调了考虑观察到的替换对蛋白质稳定性和功能的作用等互补生物学信息的重要性。本综述旨在帮助旨在准确估计蛋白质编码序列中选择的进化生物学家。