Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY.
Biodiversity Research Center, Academia Sinica, Taipei, Taiwan.
Mol Biol Evol. 2020 Aug 1;37(8):2440-2449. doi: 10.1093/molbev/msaa087.
Purifying (negative) natural selection is a hallmark of functional biological sequences, and can be detected in protein-coding genes using the ratio of nonsynonymous to synonymous substitutions per site (dN/dS). However, when two genes overlap the same nucleotide sites in different frames, synonymous changes in one gene may be nonsynonymous in the other, perturbing dN/dS. Thus, scalable methods are needed to estimate functional constraint specifically for overlapping genes (OLGs). We propose OLGenie, which implements a modification of the Wei-Zhang method. Assessment with simulations and controls from viral genomes (58 OLGs and 176 non-OLGs) demonstrates low false-positive rates and good discriminatory ability in differentiating true OLGs from non-OLGs. We also apply OLGenie to the unresolved case of HIV-1's putative antisense protein gene, showing significant purifying selection. OLGenie can be used to study known OLGs and to predict new OLGs in genome annotation. Software and example data are freely available at https://github.com/chasewnelson/OLGenie (last accessed April 10, 2020).
纯化(负)自然选择是功能生物序列的标志,可以使用每个位置非同义替换与同义替换的比率(dN/dS)在编码蛋白的基因中检测到。然而,当两个基因在不同框架中重叠相同的核苷酸位点时,一个基因中的同义变化可能在另一个基因中是非同义的,从而破坏了 dN/dS。因此,需要可扩展的方法来专门估计重叠基因(OLG)的功能约束。我们提出了 OLGenie,它实现了 Wei-Zhang 方法的修改。通过对病毒基因组的模拟和对照(58 个 OLG 和 176 个非 OLG)的评估表明,OLGenie 具有较低的假阳性率和良好的区分能力,可以区分真正的 OLG 和非 OLG。我们还将 OLGenie 应用于 HIV-1 假定的反义蛋白基因的未解决案例,表明其受到显著的纯化选择。OLGenie 可用于研究已知的 OLG,并预测基因组注释中的新 OLG。软件和示例数据可在 https://github.com/chasewnelson/OLGenie 上免费获取(最后访问时间为 2020 年 4 月 10 日)。