Yates J R
Department of Molecular Biotechnology, School of Medicine, University of Washington, Seattle 98185-7730, USA.
Electrophoresis. 1998 May;19(6):893-900. doi: 10.1002/elps.1150190604.
Large-scale DNA sequencing is creating a sequence infrastructure of great benefit to protein biochemistry. Concurrent with the application of large-scale DNA sequencing to whole genome analysis, mass spectrometry has attained the capability to rapidly, and with remarkable sensitivity, determine weights and amino acid sequences of peptides. Computer algorithms have been developed to use the two different types of data generated by mass spectrometers to search sequence databases. When a protein is digested with a site-specific protease, the molecular weights of the resulting collection of peptides, the mass map or fingerprint, can be determined using mass spectrometry. The molecular weights of the set of peptides derived from the digestion of a protein can then be used to identify the protein. Several different approaches have been developed. Protein identification using peptide mass mapping is an effective technique when studying organisms with completed genomes. A second method is based on the use of data created by tandem mass spectrometers. Tandem mass spectra contain highly specific information in the fragmentation pattern as well as sequence information. This information has been used to search databases of translated protein sequences as well as nucleotide databases such as expressed sequence tag (EST) sequences. The ability to search nucleotide databases is an advantage when analyzing data obtained from organisms whose genomes are not yet completed, but a large amount of expressed gene sequence is available (e.g., human and mouse). Furthermore, a strength of using tandem mass spectra to search databases is the ability to identify proteins present in fairly complex mixtures.
大规模DNA测序正在构建一个对蛋白质生物化学大有裨益的序列基础设施。在将大规模DNA测序应用于全基因组分析的同时,质谱分析法已具备快速且以极高灵敏度测定肽段分子量和氨基酸序列的能力。人们已开发出计算机算法,用于利用质谱仪生成的两种不同类型的数据搜索序列数据库。当用位点特异性蛋白酶消化蛋白质时,可使用质谱分析法测定所得肽段集合的分子量,即质量图谱或指纹图谱。然后,源自蛋白质消化的肽段组的分子量可用于鉴定该蛋白质。现已开发出几种不同的方法。在研究具有完整基因组的生物体时,使用肽质量图谱进行蛋白质鉴定是一种有效的技术。第二种方法基于串联质谱仪所产生数据的应用。串联质谱图在碎片模式中包含高度特异性信息以及序列信息。此信息已用于搜索翻译后的蛋白质序列数据库以及核苷酸数据库,如表达序列标签(EST)序列。在分析从基因组尚未完成但有大量表达基因序列(如人类和小鼠)的生物体获得的数据时,搜索核苷酸数据库的能力是一项优势。此外,使用串联质谱图搜索数据库的一个优点是能够鉴定存在于相当复杂混合物中的蛋白质。