Yates J R, Speicher S, Griffin P R, Hunkapiller T
Department of Molecular Biotechnology, School of Medicine, University of Washington, Seattle 98195.
Anal Biochem. 1993 Nov 1;214(2):397-408. doi: 10.1006/abio.1993.1514.
A computer searching algorithm has been used to identify protein sequences in the Protein Information Resource (PIR) database with peptide mass information (mass map) obtained from proteolytic digests of proteins analyzed by microcapillary high-performance liquid chromatography electrospray ionization mass spectrometry. A theoretical analysis of the cytochrome c family demonstrates the ability to identify protein sequences in the PIR database with a high degree of accuracy using a set of six predicted tryptic peptide masses. This method was also applied to experimentally determined peptide masses for a small GTP-binding protein, a protein from pig uterus, the human sex steroid binding protein, and a thermostable DNA polymerase. The results demonstrate that a set of observed masses which is less than 50% of the total number of predicted masses can be used to identify a protein sequence in the database. For the analysis presented in this paper, a mass matching tolerance of 1 amu is used. Under these conditions, mass maps created by fast atom bombardment mass spectrometry and matrix-assisted laser desorption time-of-flight would also be applicable. In cases where multiple matches are observed or verification of the protein identification is needed, tandem mass spectrometry sequencing can be used to establish sequence similarity.
一种计算机搜索算法已被用于在蛋白质信息资源(PIR)数据库中识别蛋白质序列,该算法利用从通过微毛细管高效液相色谱电喷雾电离质谱分析的蛋白质的蛋白水解消化物中获得的肽质量信息(质量图谱)。细胞色素c家族的理论分析表明,使用一组六个预测的胰蛋白酶肽质量能够高度准确地在PIR数据库中识别蛋白质序列。该方法还应用于实验确定的一种小GTP结合蛋白、一种来自猪子宫的蛋白质、人甾体激素结合蛋白和一种耐热DNA聚合酶的肽质量。结果表明,一组观察到的质量数不到预测质量总数的50%,就可用于识别数据库中的蛋白质序列。对于本文所呈现的分析,使用了1原子质量单位的质量匹配容差。在这些条件下,由快速原子轰击质谱和基质辅助激光解吸飞行时间质谱创建的质量图谱也将适用。在观察到多个匹配项或需要验证蛋白质鉴定的情况下,可使用串联质谱测序来确定序列相似性。