Gallien Sébastien, Perrodou Emmanuel, Carapito Christine, Deshayes Caroline, Reyrat Jean-Marc, Van Dorsselaer Alain, Poch Olivier, Schaeffer Christine, Lecompte Odile
Laboratoire de Spectrométrie de Masse Bio-Organique, IPHC-DSA, ULP, CNRS, UMR7178, 67 087 Strasbourg, France.
Genome Res. 2009 Jan;19(1):128-35. doi: 10.1101/gr.081901.108. Epub 2008 Oct 27.
The progress in sequencing technologies irrigates biology with an ever-increasing number of genome sequences. In most cases, the gene repertoire is predicted in silico and conceptually translated into proteins. As recently highlighted, the predicted genes exhibit frequent errors, particularly in start codons, with a serious impact on subsequent biological studies. A new "ortho-proteogenomic" approach is presented here for the annotation refinement of multiple genomes at once. It combines comparative genomics with an original proteomic protocol that allows the characterization of both N-terminal and internal peptides in a single experiment. This strategy was applied to the Mycobacterium genus with Mycobacterium smegmatis as the reference, and identified 946 distinct proteins, including 443 characterized N termini. These experimental data allowed the correction of 19% of the characterized start codons, the identification of 29 proteins missed during the annotation process, and the curation, thanks to comparative genomics, of 4328 sequences of 16 other Mycobacterium proteomes.
测序技术的进步为生物学注入了越来越多的基因组序列。在大多数情况下,基因库是通过计算机预测的,并在概念上转化为蛋白质。正如最近所强调的,预测的基因经常出现错误,特别是在起始密码子方面,这对后续的生物学研究产生了严重影响。本文提出了一种新的“正交蛋白质基因组学”方法,可一次性对多个基因组进行注释优化。它将比较基因组学与一种原始的蛋白质组学方案相结合,该方案允许在单个实验中对N端和内部肽段进行表征。以耻垢分枝杆菌为参考,将该策略应用于分枝杆菌属,鉴定出946种不同的蛋白质,包括443个已表征的N端。这些实验数据纠正了19%已表征的起始密码子,鉴定出注释过程中遗漏的29种蛋白质,并通过比较基因组学对其他16个分枝杆菌蛋白质组的4328个序列进行了整理。