Ptok Johannes, Müller Lisa, Ostermann Philipp Niklas, Ritchie Anastasia, Dilthey Alexander T, Theiss Stephan, Schaal Heiner
Institute of Virology, Medical Faculty, Heinrich Heine University Düsseldorf, D-40225 Düsseldorf, Germany.
Institute of Medical Statistics and Computational Biology, University of Cologne, Cologne, Germany.
Comput Struct Biotechnol J. 2021 May 21;19:3069-3076. doi: 10.1016/j.csbj.2021.05.033. eCollection 2021.
Codon degeneracy of amino acid sequences permits an additional "mRNP code" layer underlying the genetic code that is related to RNA processing. In pre-mRNA splicing, splice site usage is determined by both intrinsic strength and sequence context providing RNA binding sites for splicing regulatory proteins. In this study, we systematically examined modification of splicing regulatory properties in the neighborhood of a GT site, i.e. potential splice site, without altering the encoded amino acids. We quantified the splicing regulatory properties of the neighborhood around a potential splice site by its (SSHW) based on the HEXplorer score algorithm. To systematically modify GT site neighborhoods, either minimizing or maximizing their SSHW, we designed the novel stochastic optimization algorithm that applies a genetic algorithm with stochastic crossover, insertion and random mutation elements supplemented by a heuristic sliding window approach. To assess the achievable range in SSHW in human splice donors without altering the encoded amino acids, we applied ModCon to a set of 1000 randomly selected Ensembl annotated human splice donor sites, achieving substantial and accurate changes in SSHW. Using ModCon optimization, we successfully switched splice donor usage in a splice site competition reporter containing coding sequences from FANCA, FANCB or BRCA2, while retaining their amino acid coding information. The ModCon algorithm and its R package implementation can assist in reporter design by either introducing novel splice sites, silencing accidental, undesired splice sites, and by generally modifying the entire mRNP code while maintaining the genetic code.
氨基酸序列的密码子简并性允许在与RNA加工相关的遗传密码基础上增加一个额外的“mRNP密码”层。在pre-mRNA剪接中,剪接位点的使用由内在强度和序列上下文决定,序列上下文为剪接调节蛋白提供RNA结合位点。在本研究中,我们系统地研究了GT位点(即潜在剪接位点)附近剪接调节特性的修饰,同时不改变编码的氨基酸。我们基于HEXplorer评分算法,通过其(SSHW)量化潜在剪接位点周围区域的剪接调节特性。为了系统地修饰GT位点附近区域,使其SSHW最小化或最大化,我们设计了一种新颖的随机优化算法,该算法应用了带有随机交叉、插入和随机突变元素的遗传算法,并辅以启发式滑动窗口方法。为了评估在不改变编码氨基酸的情况下人类剪接供体中SSHW的可实现范围,我们将ModCon应用于一组1000个随机选择的Ensembl注释的人类剪接供体位点,实现了SSHW的显著且准确的变化。使用ModCon优化,我们成功地在一个包含FANCA、FANCB或BRCA2编码序列的剪接位点竞争报告基因中切换了剪接供体的使用,同时保留了它们的氨基酸编码信息。ModCon算法及其R包实现可以通过引入新的剪接位点、沉默意外的、不需要的剪接位点,以及在维持遗传密码的同时普遍修饰整个mRNP密码,来协助报告基因设计。