Kim Seongho, Carruthers Nicholas, Lee Joohyoung, Chinni Sreenivasa, Stemmer Paul
Biostatistics Core, Karmanos Cancer Institute, Wayne State University, Detroit, MI 48201, USA; Department of Oncology, Wayne State University, Detroit, MI 48201, USA.
Proteomics Core, Karmanos Cancer Institute, Wayne State University, Detroit, MI 48201, USA; Institute of Environmental Health Sciences, Wayne State University, Detroit, MI 48201, USA.
Comput Methods Programs Biomed. 2016 Dec;137:137-148. doi: 10.1016/j.cmpb.2016.09.017. Epub 2016 Sep 22.
Stable isotope labeling by amino acids in cell culture (SILAC) is a practical and powerful approach for quantitative proteomic analysis. A key advantage of SILAC is the ability to simultaneously detect the isotopically labeled peptides in a single instrument run and so guarantee relative quantitation for a large number of peptides without introducing any variation caused by separate experiment. However, there are a few approaches available to assessing protein ratios and none of the existing algorithms pays considerable attention to the proteins having only one peptide hit.
We introduce new quantitative approaches to dealing with SILAC protein-level summary using classification-based methodologies, such as Gaussian mixture models with EM algorithms and its Bayesian approach as well as K-means clustering. In addition, a new approach is developed using Gaussian mixture model and a stochastic, metaheuristic global optimization algorithm, particle swarm optimization (PSO), to avoid either a premature convergence or being stuck in a local optimum.
Our simulation studies show that the newly developed PSO-based method performs the best among others in terms of F1 score and the proposed methods further demonstrate the ability of detecting potential markers through real SILAC experimental data.
No matter how many peptide hits the protein has, the developed approach can be applicable, rescuing many proteins doomed to removal. Furthermore, no additional correction for multiple comparisons is necessary for the developed methods, enabling direct interpretation of the analysis outcomes.
细胞培养中氨基酸稳定同位素标记(SILAC)是一种用于定量蛋白质组分析的实用且强大的方法。SILAC的一个关键优势在于能够在单次仪器运行中同时检测同位素标记的肽段,从而在不引入任何因单独实验导致的变异的情况下,保证对大量肽段进行相对定量。然而,目前评估蛋白质比例的方法有限,且现有的算法均未充分关注仅有一个肽段匹配的蛋白质。
我们引入了新的定量方法,采用基于分类的方法来处理SILAC蛋白质水平的汇总数据,如使用EM算法的高斯混合模型及其贝叶斯方法以及K均值聚类。此外,还开发了一种新方法,使用高斯混合模型和一种随机的、元启发式全局优化算法——粒子群优化(PSO),以避免过早收敛或陷入局部最优。
我们的模拟研究表明,新开发的基于PSO的方法在F1分数方面表现优于其他方法,并且所提出的方法通过实际的SILAC实验数据进一步证明了检测潜在标志物的能力。
无论蛋白质有多少个肽段匹配,所开发的方法都适用,挽救了许多注定要被剔除的蛋白质。此外,所开发的方法无需对多重比较进行额外校正,能够直接解释分析结果。