Demichelis Francesca, Magni Paolo, Piergiorgi Paolo, Rubin Mark A, Bellazzi Riccardo
Bionformatics, SRA, ITC-irst & Dept. of Information and Communication Technology, University of Trento, Trento, Italy.
BMC Bioinformatics. 2006 Nov 24;7:514. doi: 10.1186/1471-2105-7-514.
Uncertainty often affects molecular biology experiments and data for different reasons. Heterogeneity of gene or protein expression within the same tumor tissue is an example of biological uncertainty which should be taken into account when molecular markers are used in decision making. Tissue Microarray (TMA) experiments allow for large scale profiling of tissue biopsies, investigating protein patterns characterizing specific disease states. TMA studies deal with multiple sampling of the same patient, and therefore with multiple measurements of same protein target, to account for possible biological heterogeneity. The aim of this paper is to provide and validate a classification model taking into consideration the uncertainty associated with measuring replicate samples.
We propose an extension of the well-known Naïve Bayes classifier, which accounts for biological heterogeneity in a probabilistic framework, relying on Bayesian hierarchical models. The model, which can be efficiently learned from the training dataset, exploits a closed-form of classification equation, thus providing no additional computational cost with respect to the standard Naïve Bayes classifier. We validated the approach on several simulated datasets comparing its performances with the Naïve Bayes classifier. Moreover, we demonstrated that explicitly dealing with heterogeneity can improve classification accuracy on a TMA prostate cancer dataset.
The proposed Hierarchical Naïve Bayes classifier can be conveniently applied in problems where within sample heterogeneity must be taken into account, such as TMA experiments and biological contexts where several measurements (replicates) are available for the same biological sample. The performance of the new approach is better than the standard Naïve Bayes model, in particular when the within sample heterogeneity is different in the different classes.
不确定性常常因不同原因影响分子生物学实验和数据。同一肿瘤组织内基因或蛋白质表达的异质性就是生物不确定性的一个例子,在使用分子标记进行决策时应予以考虑。组织微阵列(TMA)实验允许对组织活检进行大规模分析,研究表征特定疾病状态的蛋白质模式。TMA研究涉及对同一患者的多次采样,因此涉及对同一蛋白质靶点的多次测量,以考虑可能的生物异质性。本文的目的是提供并验证一个考虑与测量重复样本相关的不确定性的分类模型。
我们提出了著名的朴素贝叶斯分类器的一种扩展,它在概率框架内考虑生物异质性,依赖于贝叶斯层次模型。该模型可以从训练数据集中有效地学习,利用分类方程的封闭形式,因此相对于标准朴素贝叶斯分类器没有额外的计算成本。我们在几个模拟数据集上验证了该方法,并将其性能与朴素贝叶斯分类器进行了比较。此外,我们证明了明确处理异质性可以提高TMA前列腺癌数据集的分类准确性。
所提出的层次朴素贝叶斯分类器可以方便地应用于必须考虑样本内异质性的问题,如TMA实验以及同一生物样本有多次测量(重复)的生物背景。新方法的性能优于标准朴素贝叶斯模型,特别是当不同类别中的样本内异质性不同时。