Vermont Biomedical Research Network and Department of Biology, University of Vermont, Burlington, VT, 05405, USA.
Institute for Translational Research and Department of Family Medicine, University of North Texas Health Science Center, Fort Worth, TX, 76107, USA.
BMC Bioinformatics. 2020 Dec 3;21(Suppl 9):541. doi: 10.1186/s12859-020-03824-8.
Alternative splicing isoforms have been reported as a new and robust class of diagnostic biomarkers. Over 95% of human genes are estimated to be alternatively spliced as a powerful means of producing functionally diverse proteins from a single gene. The emergence of next-generation sequencing technologies, especially RNA-seq, provides novel insights into large-scale detection and analysis of alternative splicing at the transcriptional level. Advances in Proteomic Technologies such as liquid chromatography coupled tandem mass spectrometry (LC-MS/MS), have shown tremendous power for the parallel characterization of large amount of proteins in biological samples. Although poor correspondence has been generally found from previous qualitative comparative analysis between proteomics and microarray data, significantly higher degrees of correlation have been observed at the level of exon. Combining protein and RNA data by searching LC-MS/MS data against a customized protein database from RNA-Seq may produce a subset of alternatively spliced protein isoform candidates that have higher confidence.
We developed a bioinformatics workflow to discover alternative splicing biomarkers from LC-MS/MS using RNA-Seq. First, we retrieved high confident, novel alternative splicing biomarkers from the breast cancer RNA-Seq database. Then, we translated these sequences into in silico Isoform Junction Peptides, and created a customized alternative splicing database for MS searching. Lastly, we ran the Open Mass spectrometry Search Algorithm against the customized alternative splicing database with breast cancer plasma proteome. Twenty six alternative splicing biomarker peptides with one single intron event and one exon skipping event were identified. Further interpretation of biological pathways with our Integrated Pathway Analysis Database showed that these 26 peptides are associated with Cancer, Signaling, Metabolism, Regulation, Immune System and Hemostasis pathways, which are consistent with the 256 alternative splicing biomarkers from the RNA-Seq.
This paper presents a bioinformatics workflow for using RNA-seq data to discover novel alternative splicing biomarkers from the breast cancer proteome. As a complement to synthetic alternative splicing database technique for alternative splicing identification, this method combines the advantages of two platforms: mass spectrometry and next generation sequencing and can help identify potentially highly sample-specific alternative splicing isoform biomarkers at early-stage of cancer.
选择性剪接异构体已被报道为一种新的、强大的诊断生物标志物类别。据估计,超过 95%的人类基因都存在选择性剪接,这是一种从单个基因产生功能多样化蛋白质的有力手段。下一代测序技术(尤其是 RNA-seq)的出现,为在转录水平上大规模检测和分析选择性剪接提供了新的见解。蛋白质组学技术的进步,如液相色谱串联质谱(LC-MS/MS),已经显示出在平行分析生物样本中大量蛋白质方面的巨大潜力。尽管之前在蛋白质组学和微阵列数据的定性比较分析中发现相关性较差,但在exon 水平上观察到的相关性显著更高。通过将 LC-MS/MS 数据与来自 RNA-seq 的定制蛋白质数据库进行比对来结合蛋白质和 RNA 数据,可能会产生一组具有更高置信度的选择性剪接蛋白质异构体候选物。
我们开发了一种生物信息学工作流程,用于从 LC-MS/MS 中使用 RNA-seq 发现选择性剪接生物标志物。首先,我们从乳腺癌 RNA-seq 数据库中检索高可信度、新的选择性剪接生物标志物。然后,我们将这些序列翻译成虚拟的 Isoform Junction Peptides,并为 MS 搜索创建一个定制的选择性剪接数据库。最后,我们使用乳腺癌血浆蛋白质组对定制的选择性剪接数据库运行 Open Mass spectrometry Search Algorithm。鉴定出 26 个具有单个内含子事件和一个外显子跳跃事件的选择性剪接生物标志物肽。使用我们的综合途径分析数据库对生物途径进行进一步解释表明,这 26 个肽与癌症、信号转导、代谢、调节、免疫系统和止血途径有关,这与 RNA-seq 中的 256 个选择性剪接生物标志物一致。
本文提出了一种使用 RNA-seq 数据从乳腺癌蛋白质组中发现新的选择性剪接生物标志物的生物信息学工作流程。作为用于选择性剪接鉴定的合成选择性剪接数据库技术的补充,该方法结合了两种平台的优势:质谱和下一代测序,并可帮助在癌症早期识别潜在的高度样本特异性选择性剪接异构体生物标志物。