Department of Statistics, Seoul National University, Korea.
BMC Bioinformatics. 2011 Oct 28;12:423. doi: 10.1186/1471-2105-12-423.
Quantification of protein expression by means of mass spectrometry (MS) has been introduced in various proteomics studies. In particular, two label-free quantification methods, such as spectral counting and spectra feature analysis have been extensively investigated in a wide variety of proteomic studies. The cornerstone of both methods is peptide identification based on a proteomic database search and subsequent estimation of peptide retention time. However, they often suffer from restrictive database search and inaccurate estimation of the liquid chromatography (LC) retention time. Furthermore, conventional peptide identification methods based on the spectral library search algorithms such as SEQUEST or SpectraST have been found to provide neither the best match nor high-scored matches. Lastly, these methods are limited in the sense that target peptides cannot be identified unless they have been previously generated and stored into the database or spectral libraries.To overcome these limitations, we propose a novel method, namely Quantification method based on Finding the Identical Spectral set for a Homogenous peptide (Q-FISH) to estimate the peptide's abundance from its tandem mass spectrometry (MS/MS) spectra through the direct comparison of experimental spectra. Intuitively, our Q-FISH method compares all possible pairs of experimental spectra in order to identify both known and novel proteins, significantly enhancing identification accuracy by grouping replicated spectra from the same peptide targets.
We applied Q-FISH to Nano-LC-MS/MS data obtained from human hepatocellular carcinoma (HCC) and normal liver tissue samples to identify differentially expressed peptides between the normal and disease samples. For a total of 44,318 spectra obtained through MS/MS analysis, Q-FISH yielded 14,747 clusters. Among these, 5,777 clusters were identified only in the HCC sample, 6,648 clusters only in the normal tissue sample, and 2,323 clusters both in the HCC and normal tissue samples. While it will be interesting to investigate peptide clusters only found from one sample, further examined spectral clusters identified both in the HCC and normal samples since our goal is to identify and assess differentially expressed peptides quantitatively. The next step was to perform a beta-binomial test to isolate differentially expressed peptides between the HCC and normal tissue samples. This test resulted in 84 peptides with significantly differential spectral counts between the HCC and normal tissue samples. We independently identified 50 and 95 peptides by SEQUEST, of which 24 and 56 peptides, respectively, were found to be known biomarkers for the human liver cancer. Comparing Q-FISH and SEQUEST results, we found 22 of the differentially expressed 84 peptides by Q-FISH were also identified by SEQUEST. Remarkably, of these 22 peptides discovered both by Q-FISH and SEQUEST, 13 peptides are known for human liver cancer and the remaining 9 peptides are known to be associated with other cancers.
We proposed a novel statistical method, Q-FISH, for accurately identifying protein species and simultaneously quantifying the expression levels of identified peptides from mass spectrometry data. Q-FISH analysis on human HCC and liver tissue samples identified many protein biomarkers that are highly relevant to HCC. Q-FISH can be a useful tool both for peptide identification and quantification on mass spectrometry data analysis. It may also prove to be more effective in discovering novel protein biomarkers than SEQUEST and other standard methods.
通过质谱(MS)进行蛋白质表达的定量已在各种蛋白质组学研究中得到了介绍。特别是,两种无标记定量方法,例如光谱计数和光谱特征分析,已在广泛的蛋白质组学研究中得到了广泛研究。这两种方法的基础都是基于蛋白质组数据库搜索的肽鉴定,以及随后对肽保留时间的估计。但是,它们通常受到限制数据库搜索和液相色谱(LC)保留时间的不准确估计的限制。此外,基于谱库搜索算法(例如 SEQUEST 或 SpectraST)的常规肽鉴定方法被发现既不能提供最佳匹配,也不能提供高分匹配。最后,这些方法受到限制,除非先前已经生成并存储到数据库或谱库中,否则无法识别目标肽。为了克服这些限制,我们提出了一种新的方法,即基于同源肽的相同光谱集进行定量(Q-FISH)的方法,通过直接比较实验光谱来从其串联质谱(MS/MS)光谱中估计肽的丰度。直观地说,我们的 Q-FISH 方法比较了所有可能的实验光谱对,以识别已知和新的蛋白质,通过将来自同一肽靶的重复光谱分组,显著提高了鉴定的准确性。
我们将 Q-FISH 应用于从人肝癌(HCC)和正常肝组织样本获得的纳升 LC-MS/MS 数据,以鉴定正常和疾病样本之间差异表达的肽。总共获得了 44,318 个通过 MS/MS 分析获得的光谱,Q-FISH 产生了 14,747 个簇。其中,仅在 HCC 样本中鉴定出 5,777 个簇,仅在正常组织样本中鉴定出 6,648 个簇,而在 HCC 和正常组织样本中均鉴定出 2,323 个簇。虽然研究仅从一个样本中发现的肽簇很有趣,但是进一步研究在 HCC 和正常样本中均鉴定出的光谱簇是有意义的,因为我们的目标是识别和定量评估差异表达的肽。下一步是进行双贝叶斯检验以分离 HCC 和正常组织样本之间差异表达的肽。该测试导致 84 个肽的光谱计数在 HCC 和正常组织样本之间存在显著差异。我们通过 SEQUEST 独立鉴定了 50 和 95 个肽,其中分别有 24 和 56 个肽被发现为人类肝癌的已知生物标志物。比较 Q-FISH 和 SEQUEST 的结果,我们发现 Q-FISH 中鉴定出的 84 个差异表达肽中有 22 个也被 SEQUEST 鉴定出来。值得注意的是,在 Q-FISH 和 SEQUEST 发现的这 22 个差异表达肽中,有 13 个肽是已知的肝癌标志物,其余 9 个肽与其他癌症有关。
我们提出了一种新的统计方法 Q-FISH,用于从质谱数据中准确识别蛋白质种类并同时定量鉴定出的肽的表达水平。Q-FISH 分析人 HCC 和肝组织样本鉴定出了许多与 HCC 高度相关的蛋白质生物标志物。Q-FISH 可作为肽鉴定和定量分析质谱数据的有用工具。与 SEQUEST 和其他标准方法相比,它在发现新的蛋白质生物标志物方面可能更有效。