Laboratoire de Biologie et Génétique du Cancer, Centre François Baclesse, Caen, France.
Inserm U1245, UNIROUEN, FHU-G4 génomique, Normandie Université, Rouen, France.
Hum Mutat. 2022 Dec;43(12):2308-2323. doi: 10.1002/humu.24491. Epub 2022 Nov 20.
Modeling splicing is essential for tackling the challenge of variant interpretation as each nucleotide variation can be pathogenic by affecting pre-mRNA splicing via disruption/creation of splicing motifs such as 5'/3' splice sites, branch sites, or splicing regulatory elements. Unfortunately, most in silico tools focus on a specific type of splicing motif, which is why we developed the Splicing Prediction Pipeline (SPiP) to perform, in one single bioinformatic analysis based on a machine learning approach, a comprehensive assessment of the variant effect on different splicing motifs. We gathered a curated set of 4616 variants scattered all along the sequence of 227 genes, with their corresponding splicing studies. The Bayesian analysis provided us with the number of control variants, that is, variants without impact on splicing, to mimic the deluge of variants from high-throughput sequencing data. Results show that SPiP can deal with the diversity of splicing alterations, with 83.13% sensitivity and 99% specificity to detect spliceogenic variants. Overall performance as measured by area under the receiving operator curve was 0.986, better than SpliceAI and SQUIRLS (0.965 and 0.766) for the same data set. SPiP lends itself to a unique suite for comprehensive prediction of spliceogenicity in the genomic medicine era. SPiP is available at: https://sourceforge.net/projects/splicing-prediction-pipeline/.
建模剪接对于解决变异解释的挑战至关重要,因为每个核苷酸变异都可能通过破坏/创建剪接基序(如 5'/3' 剪接位点、分支位点或剪接调控元件)而影响前体 mRNA 剪接而具有致病性。不幸的是,大多数计算机工具都集中在特定类型的剪接基序上,这就是为什么我们开发了剪接预测管道(SPiP),以便在基于机器学习方法的单一生物信息学分析中,对变异对不同剪接基序的影响进行全面评估。我们收集了一个精心整理的 4616 个变体,这些变体分散在 227 个基因的序列中,还有它们对应的剪接研究。贝叶斯分析为我们提供了没有剪接影响的控制变体的数量,以模拟高通量测序数据中的变体洪流。结果表明,SPiP 可以处理剪接改变的多样性,对检测剪接发生变体的敏感性为 83.13%,特异性为 99%。使用接收者操作特征曲线下的面积来衡量,总体性能为 0.986,优于 SpliceAI 和 SQUIRLS(0.965 和 0.766)在相同数据集上的表现。SPiP 适合在基因组医学时代进行综合剪接发生预测的独特套件。SPiP 可在:https://sourceforge.net/projects/splicing-prediction-pipeline/ 获得。