Mo Fan, Hong Xu, Gao Feng, Du Lin, Wang Jun, Omenn Gilbert S, Lin Biaoyang
Systems Biology Division, Zhejiang-California Nanosystems Institute (ZCNI) of Zhejiang University, Zhejiang University Huajiachi Campus, Hangzhou, PR China.
BMC Bioinformatics. 2008 Dec 16;9:537. doi: 10.1186/1471-2105-9-537.
Alternative splicing is an important gene regulation mechanism. It is estimated that about 74% of multi-exon human genes have alternative splicing. High throughput tandem (MS/MS) mass spectrometry provides valuable information for rapidly identifying potentially novel alternatively-spliced protein products from experimental datasets. However, the ability to identify alternative splicing events through tandem mass spectrometry depends on the database against which the spectra are searched.
We wrote scripts in perl, Bioperl, mysql and Ensembl API and built a theoretical exon-exon junction protein database to account for all possible combinations of exons for a gene while keeping the frame of translation (i.e., keeping only in-phase exon-exon combinations) from the Ensembl Core Database. Using our liver cancer MS/MS dataset, we identified a total of 488 non-redundant peptides that represent putative exon skipping events.
Our exon-exon junction database provides the scientific community with an efficient means to identify novel alternatively spliced (exon skipping) protein isoforms using mass spectrometry data. This database will be useful in annotating genome structures using rapidly accumulating proteomics data.
可变剪接是一种重要的基因调控机制。据估计,约74%的多外显子人类基因存在可变剪接。高通量串联(MS/MS)质谱分析为从实验数据集中快速鉴定潜在的新型可变剪接蛋白产物提供了有价值的信息。然而,通过串联质谱分析鉴定可变剪接事件的能力取决于搜索谱图所使用的数据库。
我们用Perl、Bioperl、mysql和Ensembl应用程序编程接口编写了脚本,并构建了一个理论外显子-外显子连接蛋白数据库,以涵盖一个基因外显子的所有可能组合,同时保持来自Ensembl核心数据库的翻译框架(即仅保留同相位的外显子-外显子组合)。使用我们的肝癌MS/MS数据集,我们总共鉴定出488个非冗余肽段,这些肽段代表了假定的外显子跳跃事件。
我们的外显子-外显子连接数据库为科学界提供了一种利用质谱数据鉴定新型可变剪接(外显子跳跃)蛋白异构体的有效方法。该数据库将有助于利用快速积累的蛋白质组学数据注释基因组结构。