Institute for Molecular Plant Physiology and Biophysics, University of Würzburg, Würzburg, Germany.
Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, USA.
Nat Ecol Evol. 2023 Jan;7(1):155-170. doi: 10.1038/s41559-022-01932-7. Epub 2023 Jan 5.
On macroevolutionary timescales, extensive mutations and phylogenetic uncertainty mask the signals of genotype-phenotype associations underlying convergent evolution. To overcome this problem, we extended the widely used framework of non-synonymous to synonymous substitution rate ratios and developed the novel metric ω, which measures the error-corrected convergence rate of protein evolution. While ω distinguishes natural selection from genetic noise and phylogenetic errors in simulation and real examples, its accuracy allows an exploratory genome-wide search of adaptive molecular convergence without phenotypic hypothesis or candidate genes. Using gene expression data, we explored over 20 million branch combinations in vertebrate genes and identified the joint convergence of expression patterns and protein sequences with amino acid substitutions in functionally important sites, providing hypotheses on undiscovered phenotypes. We further extended our method with a heuristic algorithm to detect highly repetitive convergence among computationally non-trivial higher-order phylogenetic combinations. Our approach allows bidirectional searches for genotype-phenotype associations, even in lineages that diverged for hundreds of millions of years.
在宏观进化时间尺度上,广泛的突变和系统发育不确定性掩盖了趋同进化背后基因型-表型关联的信号。为了解决这个问题,我们扩展了广泛使用的非同义到同义替换率比值的框架,并开发了新的度量ω,用于衡量蛋白质进化的纠错趋同率。虽然 ω 在模拟和真实示例中区分了自然选择、遗传噪声和系统发育错误,但它的准确性允许在没有表型假设或候选基因的情况下,对适应性分子趋同进行探索性的全基因组搜索。利用基因表达数据,我们探索了脊椎动物基因中超过 2000 万个分支组合,并确定了表达模式和蛋白质序列与功能重要位点上氨基酸替换的联合趋同,为未发现的表型提供了假设。我们进一步通过启发式算法扩展了我们的方法,以检测计算上复杂的高阶系统发育组合之间的高度重复趋同。我们的方法允许双向搜索基因型-表型关联,即使在已经分化了数亿年的谱系中也是如此。