Grossmann Jonas, Fischer Bernd, Baerenfaller Katja, Owiti Judith, Buhmann Joachim M, Gruissem Wilhelm, Baginsky Sacha
Institute of Plant Sciences, ETH Zurich, Universitätsrasse 2, Zurich, Switzerland.
Proteomics. 2007 Dec;7(23):4245-54. doi: 10.1002/pmic.200700474.
We present and evaluate a strategy for the mass spectrometric identification of proteins from organisms for which no genome sequence information is available that incorporates cross-species information from sequenced organisms. The presented method combines spectrum quality scoring, de novo sequencing and error tolerant BLAST searches and is designed to decrease input data complexity. Spectral quality scoring reduces the number of investigated mass spectra without a loss of information. Stringent quality-based selection and the combination of different de novo sequencing methods substantially increase the catalog of significant peptide alignments. The de novo sequences passing a reliability filter are subsequently submitted to error tolerant BLAST searches and MS-BLAST hits are validated by a sampling technique. With the described workflow, we identified up to 20% more groups of homologous proteins in proteome analyses with organisms whose genome is not sequenced than by state-of-the-art database searches in an Arabidopsis thaliana database. We consider the novel data analysis workflow an excellent screening method to identify those proteins that evade detection in proteomics experiments as a result of database constraints.
我们提出并评估了一种用于从无基因组序列信息的生物体中质谱鉴定蛋白质的策略,该策略整合了来自已测序生物体的跨物种信息。所提出的方法结合了谱图质量评分、从头测序和容错BLAST搜索,旨在降低输入数据的复杂性。谱图质量评分减少了所研究质谱的数量,而不会损失信息。基于严格质量的选择以及不同从头测序方法的结合显著增加了重要肽段比对的目录。通过可靠性筛选的从头序列随后被提交到容错BLAST搜索中,并且通过抽样技术验证MS-BLAST命中结果。使用所描述的工作流程,在对基因组未测序的生物体进行蛋白质组分析时,我们比在拟南芥数据库中进行的最新数据库搜索多鉴定出高达20%的同源蛋白质组。我们认为这种新颖的数据分析工作流程是一种出色的筛选方法,可用于鉴定那些由于数据库限制而在蛋白质组学实验中未被检测到的蛋白质。