USDA-ARS, Western Regional Research Center, 800 Buchanan St., Albany, CA 94710, USA.
Phytochemistry. 2011 Jul;72(10):1154-61. doi: 10.1016/j.phytochem.2011.01.002. Epub 2011 Feb 1.
While tandem mass spectrometry (MS/MS) is routinely used to identify proteins from complex mixtures, certain types of proteins present unique challenges for MS/MS analyses. The major wheat gluten proteins, gliadins and glutenins, are particularly difficult to distinguish by MS/MS. Each of these groups contains many individual proteins with similar sequences that include repetitive motifs rich in proline and glutamine. These proteins have few cleavable tryptic sites, often resulting in only one or two tryptic peptides that may not provide sufficient information for identification. Additionally, there are less than 14,000 complete protein sequences from wheat in the current NCBInr release. In this paper, MS/MS methods were optimized for the identification of the wheat gluten proteins. Chymotrypsin and thermolysin as well as trypsin were used to digest the proteins and the collision energy was adjusted to improve fragmentation of chymotryptic and thermolytic peptides. Specialized databases were constructed that included protein sequences derived from contigs from several assemblies of wheat expressed sequence tags (ESTs), including contigs assembled from ESTs of the cultivar under study. Two different search algorithms were used to interrogate the database and the results were analyzed and displayed using a commercially available software package (Scaffold). We examined the effect of protein database content and size on the false discovery rate. We found that as database size increased above 30,000 sequences there was a decrease in the number of proteins identified. Also, the type of decoy database influenced the number of proteins identified. Using three enzymes, two search algorithms and a specialized database allowed us to greatly increase the number of detected peptides and distinguish proteins within each gluten protein group.
虽然串联质谱(MS/MS)常用于从复杂混合物中鉴定蛋白质,但某些类型的蛋白质对 MS/MS 分析提出了独特的挑战。主要的小麦面筋蛋白,麦醇溶蛋白和麦谷蛋白,特别难以通过 MS/MS 区分。这些组中的每一个都包含许多具有相似序列的个体蛋白质,其中包括富含脯氨酸和谷氨酰胺的重复基序。这些蛋白质的胰蛋白酶切割位点很少,通常只有一个或两个胰蛋白酶肽,这可能不足以提供鉴定所需的信息。此外,当前 NCBInr 版本中来自小麦的完整蛋白质序列少于 14,000 个。在本文中,优化了 MS/MS 方法以鉴定小麦面筋蛋白。胰凝乳蛋白酶和耐热蛋白酶以及胰蛋白酶用于消化蛋白质,并调整碰撞能以改善糜蛋白酶和热蛋白酶肽的片段化。构建了专门的数据库,其中包括从小麦表达序列标签(EST)的几个组装的基因片段以及正在研究的品种的 EST 组装的基因片段中衍生的蛋白质序列。使用两种不同的搜索算法来查询数据库,并使用商业上可用的软件包(Scaffold)分析和显示结果。我们研究了蛋白质数据库内容和大小对假阳性率的影响。我们发现,当数据库大小增加到 30,000 个序列以上时,鉴定的蛋白质数量减少。此外,诱饵数据库的类型也会影响鉴定的蛋白质数量。使用三种酶、两种搜索算法和专门的数据库,我们能够大大增加检测到的肽的数量,并区分每个面筋蛋白组中的蛋白质。