Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autonóma de México, Cuernavaca, Morelos, México.
Department of Biology, Wilfrid Laurier University, Waterloo, Canada.
PeerJ. 2022 Aug 31;10:e13843. doi: 10.7717/peerj.13843. eCollection 2022.
Orthologs separate after lineages split from each other and paralogs after gene duplications. Thus, orthologs are expected to remain more functionally coherent across lineages, while paralogs have been proposed as a source of new functions. Because protein functional divergence follows from non-synonymous substitutions, we performed an analysis based on the ratio of non-synonymous to synonymous substitutions (dN/dS), as proxy for functional divergence. We used five working definitions of orthology, including reciprocal best hits (RBH), among other definitions based on network analyses and clustering. The results showed that orthologs, by all definitions tested, had values of dN/dS noticeably lower than those of paralogs, suggesting that orthologs generally tend to be more functionally stable than paralogs. The differences in dN/dS ratios remained suggesting the functional stability of orthologs after eliminating gene comparisons with potential problems, such as genes with high codon usage biases, low coverage of either of the aligned sequences, or sequences with very high similarities. Separation by percent identity of the encoded proteins showed that the differences between the dN/dS ratios of orthologs and paralogs were more evident at high sequence identity, less so as identity dropped. The last results suggest that the differences between dN/dS ratios were partially related to differences in protein identity. However, they also suggested that paralogs undergo functional divergence relatively early after duplication. Our analyses indicate that choosing orthologs as probably functionally coherent remains the right approach in comparative genomics.
直系同源物在谱系彼此分离后分离,而旁系同源物在基因复制后分离。因此,直系同源物预计在谱系之间保持更一致的功能,而旁系同源物被认为是新功能的来源。由于蛋白质功能的分歧源于非同义替换,因此我们进行了基于非同义替换与同义替换比率(dN/dS)的分析,作为功能分歧的替代指标。我们使用了五个直系同源物的工作定义,包括互相对应的最佳命中(RBH),以及基于网络分析和聚类的其他定义。结果表明,所有测试的直系同源物的 dN/dS 值明显低于旁系同源物,这表明直系同源物通常比旁系同源物更倾向于具有功能稳定性。即使消除了具有潜在问题的基因比较,例如密码子使用偏好高、对齐序列的任一序列覆盖率低或序列相似度非常高的基因,dN/dS 比值的差异仍然表明了直系同源物的功能稳定性。编码蛋白的同一性百分比的分离表明,在高序列同一性时,直系同源物和旁系同源物之间 dN/dS 比值的差异更为明显,而当同一性降低时,差异则不那么明显。最后结果表明,dN/dS 比值之间的差异部分与蛋白质同一性的差异有关。但是,它们也表明旁系同源物在复制后相对较早地经历了功能分歧。我们的分析表明,选择可能具有功能一致性的直系同源物仍然是比较基因组学中的正确方法。