Banerjee Upamanyu, Braga-Neto Ulisses M
Department of Electrical and Computer Engineering, Center for Bioinformatics and Genomics Systems Engineering, Texas A&M University, College Station, TX, USA.
Cancer Inform. 2017 Jan 9;14(Suppl 5):175-182. doi: 10.4137/CIN.S30798. eCollection 2015.
Proteomics promises to revolutionize cancer treatment and prevention by facilitating the discovery of molecular biomarkers. Progress has been impeded, however, by the small-sample, high-dimensional nature of proteomic data. We propose the application of a Bayesian approach to address this issue in classification of proteomic profiles generated by liquid chromatography-mass spectrometry (LC-MS). Our approach relies on a previously proposed model of the LC-MS experiment, as well as on the theory of the optimal Bayesian classifier (OBC). Computation of the OBC requires the combination of a likelihood-free methodology called approximate Bayesian computation (ABC) as well as Markov chain Monte Carlo (MCMC) sampling. Numerical experiments using synthetic LC-MS data based on an actual human proteome indicate that the proposed ABC-MCMC classification rule outperforms classical methods such as support vector machines, linear discriminant analysis, and 3-nearest neighbor classification rules in the case when sample size is small or the number of selected proteins used to classify is large.
蛋白质组学有望通过推动分子生物标志物的发现,彻底改变癌症的治疗与预防。然而,蛋白质组学数据小样本、高维度的特性阻碍了这一进展。我们提议应用贝叶斯方法来解决液相色谱 - 质谱联用(LC-MS)产生的蛋白质组学图谱分类中的这一问题。我们的方法依赖于先前提出的LC-MS实验模型以及最优贝叶斯分类器(OBC)理论。OBC的计算需要结合一种名为近似贝叶斯计算(ABC)的无似然方法以及马尔可夫链蒙特卡罗(MCMC)采样。基于实际人类蛋白质组的合成LC-MS数据进行的数值实验表明,在样本量较小或用于分类的所选蛋白质数量较多的情况下,所提出的ABC-MCMC分类规则优于支持向量机、线性判别分析和3-最近邻分类规则等经典方法。