一种用于聚类和基因选择的贝叶斯分层隐马尔可夫模型：在肾癌基因表达数据中的应用。

A Bayesian hierarchical hidden Markov model for clustering and gene selection: Application to kidney cancer gene expression data.

机构信息

Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minnesota, USA.

Department of Mathematics and Statistics, University of Minnesota Duluth, Minnesota, USA.

出版信息

Biom J. 2024 Jun;66(4):e2300173. doi: 10.1002/bimj.202300173.

DOI:10.1002/bimj.202300173

PMID:38817110

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11239327/

Abstract

We introduce a Bayesian approach for biclustering that accounts for the prior functional dependence between genes using hidden Markov models (HMMs). We utilize biological knowledge gathered from gene ontologies and the hidden Markov structure to capture the potential coexpression of neighboring genes. Our interpretable model-based clustering characterized each cluster of samples by three groups of features: overexpressed, underexpressed, and irrelevant features. The proposed methods have been implemented in an R package and are used to analyze both the simulated data and The Cancer Genome Atlas kidney cancer data.

摘要

我们引入了一种贝叶斯方法来进行双聚类，该方法使用隐马尔可夫模型 (HMMs) 来考虑基因之间的先验功能依赖性。我们利用从基因本体论和隐马尔可夫结构中收集的生物学知识来捕获相邻基因的潜在共表达。我们的基于可解释模型的聚类通过三组特征来描述每个样本聚类：过表达、低表达和不相关的特征。所提出的方法已在 R 包中实现，并用于分析模拟数据和癌症基因组图谱肾癌数据。