Ferro Myriam, Tardif Marianne, Reguer Erwan, Cahuzac Romain, Bruley Christophe, Vermat Thierry, Nugues Estelle, Vigouroux Marielle, Vandenbrouck Yves, Garin Jérôme, Viari Alain
CEA, DSV, iRTSV, Laboratoire d'Etude de la Dynamique des Protéomes, Grenoble, F-38054, France.
J Proteome Res. 2008 May;7(5):1873-83. doi: 10.1021/pr070415k. Epub 2008 Mar 19.
PepLine is a fully automated software which maps MS/MS fragmentation spectra of trypsic peptides to genomic DNA sequences. The approach is based on Peptide Sequence Tags (PSTs) obtained from partial interpretation of QTOF MS/MS spectra (first module). PSTs are then mapped on the six-frame translations of genomic sequences (second module) giving hits. Hits are then clustered to detect potential coding regions (third module). Our work aimed at optimizing the algorithms of each component to allow the whole pipeline to proceed in a fully automated manner using raw nucleic acid sequences (i.e., genomes that have not been "reduced" to a database of ORFs or putative exons sequences). The whole pipeline was tested on controlled MS/MS spectra sets from standard proteins and from Arabidopsis thaliana envelope chloroplast samples. Our results demonstrate that PepLine competed with protein database searching softwares and was fast enough to potentially tackle large data sets and/or high size genomes. We also illustrate the potential of this approach for the detection of the intron/exon structure of genes.
PepLine是一款全自动软件,可将胰蛋白酶肽段的串联质谱(MS/MS)碎裂图谱映射到基因组DNA序列上。该方法基于从QTOF MS/MS谱图的部分解析中获得的肽段序列标签(PSTs)(第一个模块)。然后将PSTs映射到基因组序列的六框翻译产物上(第二个模块),从而得到匹配结果。接着对匹配结果进行聚类,以检测潜在的编码区域(第三个模块)。我们的工作旨在优化每个组件的算法,使整个流程能够使用原始核酸序列(即尚未“简化”为开放阅读框或推定外显子序列数据库的基因组)以全自动方式进行。整个流程在来自标准蛋白质和拟南芥包膜叶绿体样品的受控MS/MS谱图集上进行了测试。我们的结果表明,PepLine与蛋白质数据库搜索软件相比具有竞争力,并且速度足够快,有可能处理大型数据集和/或大基因组。我们还展示了这种方法在检测基因内含子/外显子结构方面的潜力。