Bioinformatics Research Center, Center for Integrated Fungal Research, and W.M. Keck FT-ICR-MS Laboratory, Department of Chemistry, North Carolina State University, Raleigh, North Carolina 27695, USA.
J Proteome Res. 2010 Mar 5;9(3):1209-17. doi: 10.1021/pr900602d.
Identification of proteins from proteolytic peptides or intact proteins plays an essential role in proteomics. Researchers use search engines to match the acquired peptide sequences to the target proteins. However, search engines depend on protein databases to provide candidates for consideration. Alternative splicing (AS), the mechanism where the exon of pre-mRNAs can be spliced and rearranged to generate distinct mRNA and therefore protein variants, enable higher eukaryotic organisms, with only a limited number of genes, to have the requisite complexity and diversity at the proteome level. Multiple alternative isoforms from one gene often share common segments of sequences. However, many protein databases only include a limited number of isoforms to keep minimal redundancy. As a result, the database search might not identify a target protein even with high quality tandem MS data and accurate intact precursor ion mass. We computationally predicted an exhaustive list of putative isoforms of Aspergillus flavus proteins from 20 371 expressed sequence tags to investigate whether an alternative splicing protein database can assign a greater proportion of mass spectrometry data. The newly constructed AS database provided 9807 new alternatively spliced variants in addition to 12 832 previously annotated proteins. The searches of the existing tandem MS spectra data set using the AS database identified 29 new proteins encoded by 26 genes. Nine fungal genes appeared to have multiple protein isoforms. In addition to the discovery of splice variants, AS database also showed potential to improve genome annotation. In summary, the introduction of an alternative splicing database helps identify more proteins and unveils more information about a proteome.
从蛋白水解肽或完整蛋白中鉴定蛋白在蛋白质组学中起着至关重要的作用。研究人员使用搜索引擎将获得的肽序列与目标蛋白进行匹配。然而,搜索引擎依赖于蛋白数据库来提供候选物以供考虑。选择性剪接(AS)是一种机制,通过该机制,前体 mRNA 的外显子可以被剪接和重排,从而产生不同的 mRNA 和因此不同的蛋白变体,使高等真核生物能够在蛋白质组水平上具有必要的复杂性和多样性。一个基因的多个选择性异构体通常共享序列的共同片段。然而,许多蛋白数据库只包含有限数量的异构体以保持最小的冗余。因此,即使具有高质量的串联 MS 数据和准确的完整前体离子质量,数据库搜索也可能无法识别目标蛋白。我们从 20371 个表达序列标签中计算预测了黄曲霉蛋白的详尽可能异构体列表,以研究选择性剪接蛋白数据库是否可以分配更大比例的质谱数据。除了 12832 个先前注释的蛋白外,新构建的 AS 数据库还提供了 9807 个新的选择性剪接变体。使用 AS 数据库对现有的串联 MS 谱数据集进行搜索,鉴定出 26 个基因编码的 29 个新蛋白。9 个真菌基因似乎具有多个蛋白异构体。除了发现剪接变体外,AS 数据库还有助于改进基因组注释。总之,引入选择性剪接数据库有助于鉴定更多的蛋白,并揭示更多关于蛋白质组的信息。