Küster B, Mortensen P, Andersen J S, Mann M
Protein Interaction Laboratory (PIL), University of Southern Denmark, Odense M, Denmark. MDS-Proteomics, Odense M, Denmark.
Proteomics. 2001 May;1(5):641-50. doi: 10.1002/1615-9861(200104)1:5<641::AID-PROT641>3.0.CO;2-R.
Proteome projects seek to provide systematic functional analysis of the genes uncovered by genome sequencing initiatives. Mass spectrometric protein identification is a key requirement in these studies but to date, database searching tools rely on the availability of protein sequences derived from full length cDNA, expressed sequence tags or predicted open reading frames (ORFs) from genomic sequences. We demonstrate here that proteins can be identified directly in large genomic databases using peptide sequence tags obtained by tandem mass spectrometry. On the background of vast amounts of noncoding DNA sequence, identified peptides localize coding sequences (exons) in a confined region of the genome, which contains the cognate gene. The approach does not require prior information about putative ORFs as predicted by computerized gene finding algorithms. The method scales to the complete human genome and allows identification, mapping, cloning and assistance in gene prediction of any protein for which minimal mass spectrometric information can be obtained. Several novel proteins from Arabidopsis thaliana and human have been discovered in this way.
蛋白质组计划旨在对基因组测序计划所发现的基因进行系统的功能分析。质谱蛋白质鉴定是这些研究中的一项关键要求,但迄今为止,数据库搜索工具依赖于来自全长cDNA、表达序列标签或基因组序列预测的开放阅读框(ORF)的蛋白质序列。我们在此证明,使用串联质谱获得的肽序列标签可以直接在大型基因组数据库中鉴定蛋白质。在大量非编码DNA序列的背景下,鉴定出的肽在基因组的一个受限区域内定位编码序列(外显子),该区域包含同源基因。该方法不需要计算机化基因发现算法预测的关于假定ORF的先验信息。该方法可扩展到完整的人类基因组,并允许对任何能够获得最少质谱信息的蛋白质进行鉴定、定位、克隆和基因预测辅助。通过这种方式已经发现了几种来自拟南芥和人类的新型蛋白质。