Bland Céline, Hartmann Erica M, Christie-Oleza Joseph A, Fernandez Bernard, Armengaud Jean
CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze, F-30207, France;
Mol Cell Proteomics. 2014 May;13(5):1369-81. doi: 10.1074/mcp.O113.032854. Epub 2014 Feb 16.
Given the ease of whole genome sequencing with next-generation sequencers, structural and functional gene annotation is now purely based on automated prediction. However, errors in gene structure are frequent, the correct determination of start codons being one of the main concerns. Here, we combine protein N termini derivatization using (N-Succinimidyloxycarbonylmethyl)tris(2,4,6-trimethoxyphenyl)phosphonium bromide (TMPP Ac-OSu) as a labeling reagent with the COmbined FRActional DIagonal Chromatography (COFRADIC) sorting method to enrich labeled N-terminal peptides for mass spectrometry detection. Protein digestion was performed in parallel with three proteases to obtain a reliable automatic validation of protein N termini. The analysis of these N-terminal enriched fractions by high-resolution tandem mass spectrometry allowed the annotation refinement of 534 proteins of the model marine bacterium Roseobacter denitrificans OCh114. This study is especially efficient regarding mass spectrometry analytical time. From the 534 validated N termini, 480 confirmed existing gene annotations, 41 highlighted erroneous start codon annotations, five revealed totally new mis-annotated genes; the mass spectrometry data also suggested the existence of multiple start sites for eight different genes, a result that challenges the current view of protein translation initiation. Finally, we identified several proteins for which classical genome homology-driven annotation was inconsistent, questioning the validity of automatic annotation pipelines and emphasizing the need for complementary proteomic data. All data have been deposited to the ProteomeXchange with identifier PXD000337.
鉴于使用下一代测序仪进行全基因组测序很容易,结构和功能基因注释现在完全基于自动预测。然而,基因结构中的错误很常见,正确确定起始密码子是主要关注点之一。在这里,我们将使用(N-琥珀酰亚胺氧基羰基甲基)三(2,4,6-三甲氧基苯基)溴化鏻(TMPP Ac-OSu)作为标记试剂的蛋白质N端衍生化与组合分数对角线色谱法(COFRADIC)分选方法相结合,以富集标记的N端肽用于质谱检测。用三种蛋白酶并行进行蛋白质消化,以获得蛋白质N端的可靠自动验证。通过高分辨率串联质谱对这些N端富集级分进行分析,对模式海洋细菌反硝化红杆菌OCh114的534种蛋白质进行了注释优化。这项研究在质谱分析时间方面特别有效。在534个经过验证的N端中,480个确认了现有的基因注释,41个突出了错误的起始密码子注释,5个揭示了全新的错误注释基因;质谱数据还表明8个不同基因存在多个起始位点,这一结果挑战了当前对蛋白质翻译起始的看法。最后,我们鉴定了几种经典基因组同源性驱动注释不一致的蛋白质,质疑自动注释流程的有效性,并强调需要补充蛋白质组学数据。所有数据已存入ProteomeXchange,标识符为PXD000337。