Institute for Human Genetics, University of California, San Francisco, 513 Parnassus Avenue, Box 0794, San Francisco, CA 94143-0794, USA.
Nucleic Acids Res. 2013 Feb 1;41(3):1395-405. doi: 10.1093/nar/gks1261. Epub 2012 Dec 14.
Periodicity in nucleotide sequences arises from regular repeating patterns which may reflect important structure and function. Although a three-base periodicity in coding regions has been known for some time and has provided the basis for powerful gene prediction algorithms, its origins are still not fully understood. Here, we show that, contrary to common belief, amino acid (AA) bias and codon usage bias are insufficient to create base-3 periodicity. This article applies the rigorous method of spectral envelope to systematically characterize the contributions of codon bias, AA bias and protein structural motifs to the three-base periodicity of coding sequences. The method is also used to classify CpG islands in the human genome. In addition, we show how spectral envelope can be used to trace the evolution of viral genomes and monitor global sequence changes without having to align to previously known genomes. This approach also detects reassortment events, such as those that led to the 2009 pandemic H1N1 virus.
核苷酸序列的周期性源于规则的重复模式,这些模式可能反映了重要的结构和功能。尽管编码区的三碱基周期性已经存在了一段时间,并为强大的基因预测算法提供了基础,但它的起源仍未完全理解。在这里,我们表明,与普遍的看法相反,氨基酸(AA)偏倚和密码子使用偏倚不足以产生碱基 3 周期性。本文应用严格的谱包络方法系统地表征密码子偏倚、AA 偏倚和蛋白质结构基序对编码序列三碱基周期性的贡献。该方法还用于分类人类基因组中的 CpG 岛。此外,我们还展示了如何使用谱包络来追踪病毒基因组的进化,以及在不必与先前已知基因组对齐的情况下监测全局序列变化。这种方法还可以检测到重组事件,例如导致 2009 年大流行 H1N1 病毒的事件。