Computational Genomics Group, The Institute for Genomic Medicine, Nationwide Children's Hospital, 575 Children's Crossroad, Columbus, OH 43215, USA.
Department of Pediatrics, College of Medicine, The Ohio State University, 370 W. 9th Avenue, Columbus, OH 43210, USA.
Gigascience. 2021 Apr 5;10(4). doi: 10.1093/gigascience/giab023.
The role of synonymous single-nucleotide variants in human health and disease is poorly understood, yet evidence suggests that this class of "silent" genetic variation plays multiple regulatory roles in both transcription and translation. One mechanism by which synonymous codons direct and modulate the translational process is through alteration of the elaborate structure formed by single-stranded mRNA molecules. While tools to computationally predict the effect of non-synonymous variants on protein structure are plentiful, analogous tools to systematically assess how synonymous variants might disrupt mRNA structure are lacking.
We developed novel software using a parallel processing framework for large-scale generation of secondary RNA structures and folding statistics for the transcriptome of any species. Focusing our analysis on the human transcriptome, we calculated 5 billion RNA-folding statistics for 469 million single-nucleotide variants in 45,800 transcripts. By considering the impact of all possible synonymous variants globally, we discover that synonymous variants predicted to disrupt mRNA structure have significantly lower rates of incidence in the human population.
These findings support the hypothesis that synonymous variants may play a role in genetic disorders due to their effects on mRNA structure. To evaluate the potential pathogenic impact of synonymous variants, we provide RNA stability, edge distance, and diversity metrics for every nucleotide in the human transcriptome and introduce a "Structural Predictivity Index" (SPI) to quantify structural constraint operating on any synonymous variant. Because no single RNA-folding metric can capture the diversity of mechanisms by which a variant could alter secondary mRNA structure, we generated a SUmmarized RNA Folding (SURF) metric to provide a single measurement to predict the impact of secondary structure altering variants in human genetic studies.
同义单核苷酸变异在人类健康和疾病中的作用尚未得到充分理解,但有证据表明,这类“沉默”遗传变异在转录和翻译中发挥着多种调控作用。同义密码子指导和调节翻译过程的一种机制是通过改变单链 mRNA 分子形成的精细结构。虽然有大量计算预测非同义变异对蛋白质结构影响的工具,但缺乏系统评估同义变异如何破坏 mRNA 结构的类似工具。
我们使用并行处理框架开发了一种新软件,用于大规模生成任何物种转录组的二级 RNA 结构和折叠统计信息。我们将分析重点放在人类转录组上,为 45800 个转录本中的 4690 万个人类单核苷酸变异计算了 50 亿个 RNA 折叠统计信息。通过全局考虑所有可能的同义变异的影响,我们发现预测会破坏 mRNA 结构的同义变异在人类群体中的发生率明显较低。
这些发现支持了这样一种假设,即同义变异可能由于其对 mRNA 结构的影响而在遗传疾病中发挥作用。为了评估同义变异的潜在致病影响,我们为人类转录组中的每个核苷酸提供了 RNA 稳定性、边缘距离和多样性指标,并引入了“结构可预测性指数”(SPI)来量化作用于任何同义变异的结构约束。由于没有单一的 RNA 折叠指标可以捕捉变异改变二级 mRNA 结构的多样性机制,我们生成了一个 SUmmarized RNA Folding(SURF)指标,以提供一个单一的测量值来预测在人类遗传研究中改变二级结构的变异的影响。