一种用于处理分类问题中样本异质性的分层朴素贝叶斯模型：在组织微阵列中的应用。

A hierarchical Naïve Bayes Model for handling sample heterogeneity in classification problems: an application to tissue microarrays.

作者信息

Demichelis Francesca, Magni Paolo, Piergiorgi Paolo, Rubin Mark A, Bellazzi Riccardo

机构信息

Bionformatics, SRA, ITC-irst & Dept. of Information and Communication Technology, University of Trento, Trento, Italy.

出版信息

BMC Bioinformatics. 2006 Nov 24;7:514. doi: 10.1186/1471-2105-7-514.

DOI:10.1186/1471-2105-7-514

PMID:17125514

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1698579/

Abstract

BACKGROUND

Uncertainty often affects molecular biology experiments and data for different reasons. Heterogeneity of gene or protein expression within the same tumor tissue is an example of biological uncertainty which should be taken into account when molecular markers are used in decision making. Tissue Microarray (TMA) experiments allow for large scale profiling of tissue biopsies, investigating protein patterns characterizing specific disease states. TMA studies deal with multiple sampling of the same patient, and therefore with multiple measurements of same protein target, to account for possible biological heterogeneity. The aim of this paper is to provide and validate a classification model taking into consideration the uncertainty associated with measuring replicate samples.

RESULTS

We propose an extension of the well-known Naïve Bayes classifier, which accounts for biological heterogeneity in a probabilistic framework, relying on Bayesian hierarchical models. The model, which can be efficiently learned from the training dataset, exploits a closed-form of classification equation, thus providing no additional computational cost with respect to the standard Naïve Bayes classifier. We validated the approach on several simulated datasets comparing its performances with the Naïve Bayes classifier. Moreover, we demonstrated that explicitly dealing with heterogeneity can improve classification accuracy on a TMA prostate cancer dataset.

CONCLUSION

The proposed Hierarchical Naïve Bayes classifier can be conveniently applied in problems where within sample heterogeneity must be taken into account, such as TMA experiments and biological contexts where several measurements (replicates) are available for the same biological sample. The performance of the new approach is better than the standard Naïve Bayes model, in particular when the within sample heterogeneity is different in the different classes.

摘要

背景

不确定性常常因不同原因影响分子生物学实验和数据。同一肿瘤组织内基因或蛋白质表达的异质性就是生物不确定性的一个例子，在使用分子标记进行决策时应予以考虑。组织微阵列（TMA）实验允许对组织活检进行大规模分析，研究表征特定疾病状态的蛋白质模式。TMA研究涉及对同一患者的多次采样，因此涉及对同一蛋白质靶点的多次测量，以考虑可能的生物异质性。本文的目的是提供并验证一个考虑与测量重复样本相关的不确定性的分类模型。

结果

我们提出了著名的朴素贝叶斯分类器的一种扩展，它在概率框架内考虑生物异质性，依赖于贝叶斯层次模型。该模型可以从训练数据集中有效地学习，利用分类方程的封闭形式，因此相对于标准朴素贝叶斯分类器没有额外的计算成本。我们在几个模拟数据集上验证了该方法，并将其性能与朴素贝叶斯分类器进行了比较。此外，我们证明了明确处理异质性可以提高TMA前列腺癌数据集的分类准确性。

结论

所提出的层次朴素贝叶斯分类器可以方便地应用于必须考虑样本内异质性的问题，如TMA实验以及同一生物样本有多次测量（重复）的生物背景。新方法的性能优于标准朴素贝叶斯模型，特别是当不同类别中的样本内异质性不同时。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f092/1698579/7acff9e67b5c/1471-2105-7-514-1.jpg

相似文献

A hierarchical Naïve Bayes Model for handling sample heterogeneity in classification problems: an application to tissue microarrays.

BMC Bioinformatics. 2006 Nov 24;7:514. doi: 10.1186/1471-2105-7-514.

Classification of microarray data with factor mixture models.

Bioinformatics. 2006 Jan 15;22(2):202-8. doi: 10.1093/bioinformatics/bti779. Epub 2005 Nov 15.

Bayesian finite Markov mixture model for temporal multi-tissue polygenic patterns.

Biom J. 2009 Feb;51(1):56-69. doi: 10.1002/bimj.200710489.

A mixture model with random-effects components for clustering correlated gene-expression profiles.

Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.

Cancer classification and prediction using logistic regression with Bayesian gene selection.

J Biomed Inform. 2004 Aug;37(4):249-59. doi: 10.1016/j.jbi.2004.07.009.

Tumor classification ranking from microarray data.

BMC Genomics. 2008 Sep 16;9 Suppl 2(Suppl 2):S21. doi: 10.1186/1471-2164-9-S2-S21.

Bias in error estimation when using cross-validation for model selection.

BMC Bioinformatics. 2006 Feb 23;7:91. doi: 10.1186/1471-2105-7-91.

A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and chIP-chip experiments: systematically incorporating validated biological knowledge.

Bioinformatics. 2006 Dec 15;22(24):3016-24. doi: 10.1093/bioinformatics/btl515. Epub 2006 Oct 12.

Bayesian modelling of shared gene function.

Bioinformatics. 2007 Aug 1;23(15):1936-44. doi: 10.1093/bioinformatics/btm280. Epub 2007 May 31.

A GMM-IG framework for selecting genes as expression panel biomarkers.

Artif Intell Med. 2010 Feb-Mar;48(2-3):75-82. doi: 10.1016/j.artmed.2009.07.006. Epub 2009 Dec 8.

引用本文的文献

Fluorescence Lifetime Multiplexing (FLEX) for simultaneous high dimensional spatial biology in 3D.

Commun Biol. 2024 Aug 18;7(1):1012. doi: 10.1038/s42003-024-06702-8.

Integration of Prior Expectations and Suppression of Prediction Errors During Expectancy-Induced Pain Modulation: The Influence of Anxiety and Pleasantness.

J Neurosci. 2024 Apr 24;44(17):e1627232024. doi: 10.1523/JNEUROSCI.1627-23.2024.

Development and validation of a predictive scoring system for in-hospital mortality in COVID-19 Egyptian patients: a retrospective study.

Sci Rep. 2022 Dec 26;12(1):22352. doi: 10.1038/s41598-022-26471-w.

Prediction With Mixed Effects Models: A Monte Carlo Simulation Study.

Educ Psychol Meas. 2021 Dec;81(6):1118-1142. doi: 10.1177/0013164421992818. Epub 2021 Feb 16.

Identification of altered biological processes in heterogeneous RNA-sequencing data by discretization of expression profiles.

Nucleic Acids Res. 2020 Feb 28;48(4):1730-1747. doi: 10.1093/nar/gkz1208.

Prediction of malignant glioma grades using contrast-enhanced T1-weighted and T2-weighted magnetic resonance images based on a radiomic analysis.

Sci Rep. 2019 Dec 19;9(1):19411. doi: 10.1038/s41598-019-55922-0.

The periaqueductal gray and Bayesian integration in placebo analgesia.

Elife. 2018 Mar 20;7:e32930. doi: 10.7554/eLife.32930.

Analysis of prognostic factors for survival after surgery for gallbladder cancer based on a Bayesian network.

Sci Rep. 2017 Mar 22;7(1):293. doi: 10.1038/s41598-017-00491-3.

Evaluation of data discretization methods to derive platform independent isoform expression signatures for multi-class tumor subtyping.

BMC Genomics. 2015;16 Suppl 11(Suppl 11):S3. doi: 10.1186/1471-2164-16-S11-S3. Epub 2015 Nov 10.

A straightforward approach to designing a scoring system for predicting length-of-stay of cardiac surgery patients.

BMC Med Inform Decis Mak. 2014 Oct 13;14:89. doi: 10.1186/1472-6947-14-89.

本文引用的文献

Defining aggressive prostate cancer using a 12-gene model.

Neoplasia. 2006 Jan;8(1):59-68. doi: 10.1593/neo.05664.

Random walk models for bayesian clustering of gene expression profiles.

Appl Bioinformatics. 2005;4(4):263-76. doi: 10.2165/00822942-200504040-00006.

Decreased alpha-methylacyl CoA racemase expression in localized prostate cancer is associated with an increased rate of biochemical recurrence and cancer-specific death.

Cancer Epidemiol Biomarkers Prev. 2005 Jun;14(6):1424-32. doi: 10.1158/1055-9965.EPI-04-0801.

BGX: a fully Bayesian integrated approach to the analysis of Affymetrix GeneChip data.

Biostatistics. 2005 Jul;6(3):349-73. doi: 10.1093/biostatistics/kxi016. Epub 2005 Apr 14.

A hierarchical Binomial-Poisson model for the analysis of a crossover design for correlated binary data when the number of trials is dose-dependent.

J Biopharm Stat. 2005;15(2):225-39.

A Bayesian approach to joint feature selection and classifier design.

IEEE Trans Pattern Anal Mach Intell. 2004 Sep;26(9):1105-11. doi: 10.1109/TPAMI.2004.55.

A hierarchical Bayesian model for learning nonlinear statistical regularities in nonstationary natural signals.

Neural Comput. 2005 Feb;17(2):397-423. doi: 10.1162/0899766053011474.

Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data.

Bioinformatics. 2005 May 15;21(10):2394-402. doi: 10.1093/bioinformatics/bti319. Epub 2005 Feb 15.

Robust sparse hyperplane classifiers: application to uncertain molecular profiling data.

J Comput Biol. 2004;11(6):1073-89. doi: 10.1089/cmb.2004.11.1073.

Prospective evaluation of AMACR (P504S) and basal cell markers in the assessment of routine prostate needle biopsy specimens.

Hum Pathol. 2004 Dec;35(12):1462-8. doi: 10.1016/j.humpath.2004.09.009.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于处理分类问题中样本异质性的分层朴素贝叶斯模型：在组织微阵列中的应用。

A hierarchical Naïve Bayes Model for handling sample heterogeneity in classification problems: an application to tissue microarrays.

作者信息

Demichelis Francesca, Magni Paolo, Piergiorgi Paolo, Rubin Mark A, Bellazzi Riccardo

机构信息

Bionformatics, SRA, ITC-irst & Dept. of Information and Communication Technology, University of Trento, Trento, Italy.