Department of Bioengineering, University of California, Berkeley, California.
Department of Plant and Microbial Biology, University of California, Berkeley, California.
Hum Mutat. 2019 Sep;40(9):1270-1279. doi: 10.1002/humu.23790. Epub 2019 Jun 18.
Accurate interpretation of genomic variants that alter RNA splicing is critical to precision medicine. We present a computational framework, Prediction of variant Effect on Percent Spliced In (PEPSI), that predicts the splicing impact of coding and noncoding variants for the Fifth Critical Assessment of Genome Interpretation (CAGI5) "Vex-seq" challenge. PEPSI is a random forest regression model trained on multiple layers of features associated with sequence conservation and regulatory sequence elements. Compared to other splicing defect prediction tools from the literature, our framework integrates secondary structure information in predicting variants that disrupt splicing regulatory elements (SREs). We applied our model to classify splice-disrupting variants among 2,094 single-nucleotide polymorphisms from the Exome Aggregation Consortium using model-predicted changes in percent spliced in (ΔPSI) associated with tested variants. Benchmarking our model against widely used state-of-the-art tools, we demonstrate that PEPSI achieves comparable performance in terms of sensitivity and precision. Moreover, we also show that using secondary structure context can help resolve several cases where changes in the counts of SREs do not correspond with the directionality of ΔPSI measured for tested variants.
准确解读改变 RNA 剪接的基因组变异对于精准医学至关重要。我们提出了一种计算框架,即预测变异对插入百分比的影响(PEPSI),用于第五次基因组解读评估(CAGI5)“Vex-seq”挑战中的编码和非编码变异的剪接影响。PEPSI 是一个随机森林回归模型,基于与序列保守性和调控序列元件相关的多个特征层进行训练。与文献中的其他剪接缺陷预测工具相比,我们的框架在预测破坏剪接调控元件(SRE)的变异时整合了二级结构信息。我们使用模型预测的与测试变异相关的插入百分比变化(ΔPSI),将我们的模型应用于从外显子聚集联盟的 2094 个单核苷酸多态性中分类剪接破坏变异。将我们的模型与广泛使用的最先进工具进行基准测试,我们证明了 PEPSI 在灵敏度和精度方面具有相当的性能。此外,我们还表明,使用二级结构上下文可以帮助解决几种情况,其中 SRE 计数的变化与针对测试变异测量的ΔPSI 的方向性不对应。