Sealfon Rachel S, Lin Michael F, Jungreis Irwin, Wolf Maxim Y, Kellis Manolis, Sabeti Pardis C
Genome Biol. 2015 Feb 17;16(1):38. doi: 10.1186/s13059-015-0603-7.
The increasing availability of sequence data for many viruses provides power to detect regions under unusual evolutionary constraint at a high resolution. One approach leverages the synonymous substitution rate as a signature to pinpoint genic regions encoding overlapping or embedded functional elements. Protein-coding regions in viral genomes often contain overlapping RNA structural elements, reading frames, regulatory elements, microRNAs, and packaging signals. Synonymous substitutions in these regions would be selectively disfavored and thus these regions are characterized by excess synonymous constraint. Codon choice can also modulate transcriptional efficiency, translational accuracy, and protein folding.
We developed a phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates. We demonstrated the high specificity of our approach on simulated data and applied our framework to the protein-coding regions of approximately 30 distinct species of viruses with diverse genome architectures.
FRESCo recovers known multifunctional regions in well-characterized viruses such as hepatitis B virus, poliovirus, and West Nile virus, often at a single-codon resolution, and predicts many novel functional elements overlapping viral genes, including in Lassa and Ebola viruses. In a number of viruses, the synonymously constrained regions that we identified also display conserved, stable predicted RNA structures, including putative novel elements in multiple viral species.
许多病毒序列数据的日益可得,为在高分辨率下检测处于异常进化限制下的区域提供了助力。一种方法利用同义替换率作为特征来精准定位编码重叠或嵌入功能元件的基因区域。病毒基因组中的蛋白质编码区域通常包含重叠的RNA结构元件、阅读框、调控元件、微小RNA和包装信号。这些区域中的同义替换会受到选择性不利影响,因此这些区域具有过量同义限制的特征。密码子选择还可以调节转录效率、翻译准确性和蛋白质折叠。
我们开发了一种基于系统发育密码子模型的框架FRESCo,旨在在短而深度的比对中,例如在许多测序分离株中的单个病毒基因中,找到过量同义限制的区域。我们在模拟数据上证明了我们方法的高特异性,并将我们的框架应用于约30种具有不同基因组结构的不同病毒物种的蛋白质编码区域。
FRESCo能够在特征明确的病毒(如乙型肝炎病毒、脊髓灰质炎病毒和西尼罗河病毒)中恢复已知的多功能区域,通常能达到单密码子分辨率,并预测了许多与病毒基因重叠的新功能元件,包括拉沙病毒和埃博拉病毒中的元件。在许多病毒中,我们鉴定出的同义限制区域还显示出保守、稳定的预测RNA结构,包括多个病毒物种中的假定新元件。