Suppr超能文献

用于亚型识别的贝叶斯半参数因子分析模型

A Bayesian semiparametric factor analysis model for subtype identification.

作者信息

Sun Jiehuan, Warren Joshua L, Zhao Hongyu

机构信息

.

出版信息

Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):145-158. doi: 10.1515/sagmb-2016-0051.

Abstract

Disease subtype identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to infer disease subtypes, which often lead to biologically meaningful insights into disease. Despite many successes, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering due to the high dimensionality. In this article, we introduce a novel subtype identification method in the Bayesian setting based on gene expression profiles. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering. Through extensive simulation studies, we show that BCSub has improved performance over commonly used clustering methods. When applied to two gene expression datasets, our model is able to identify subtypes that are clinically more relevant than those identified from the existing methods.

摘要

疾病亚型识别(聚类)是生物医学研究中的一个重要问题。基因表达谱通常用于推断疾病亚型,这往往能带来对疾病具有生物学意义的见解。尽管取得了许多成功,但当基因高度相关且由于高维性而包含许多无信息基因用于聚类时,现有的聚类方法可能表现不佳。在本文中,我们基于基因表达谱在贝叶斯框架下介绍一种新颖的亚型识别方法。这种方法称为BCSub,采用创新的半参数贝叶斯因子分析模型将数据维度降至几个因子得分用于聚类。具体而言,假设因子得分遵循狄利克雷过程混合模型以进行聚类。通过广泛的模拟研究,我们表明BCSub比常用的聚类方法具有更好的性能。当应用于两个基因表达数据集时,我们的模型能够识别出比现有方法识别出的更具临床相关性的亚型。

相似文献

1
A Bayesian semiparametric factor analysis model for subtype identification.
Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):145-158. doi: 10.1515/sagmb-2016-0051.
2
A Dirichlet process mixture model for clustering longitudinal gene expression data.
Stat Med. 2017 Sep 30;36(22):3495-3506. doi: 10.1002/sim.7374. Epub 2017 Jun 15.
3
Modeling and visualizing uncertainty in gene expression clusters using dirichlet process mixtures.
IEEE/ACM Trans Comput Biol Bioinform. 2009 Oct-Dec;6(4):615-28. doi: 10.1109/TCBB.2007.70269.
4
Bayesian mixture model based clustering of replicated microarray data.
Bioinformatics. 2004 May 22;20(8):1222-32. doi: 10.1093/bioinformatics/bth068. Epub 2004 Feb 10.
5
Bayesian infinite mixture model based clustering of gene expression profiles.
Bioinformatics. 2002 Sep;18(9):1194-206. doi: 10.1093/bioinformatics/18.9.1194.
6
A mixture model with random-effects components for clustering correlated gene-expression profiles.
Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.
7
A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data.
Bioinformatics. 2005 Jul 1;21(13):3025-33. doi: 10.1093/bioinformatics/bti466. Epub 2005 Apr 28.
8
Bayesian model-based clustering of temporal gene expression using autoregressive panel data approach.
Bioinformatics. 2012 Aug 1;28(15):2004-7. doi: 10.1093/bioinformatics/bts322. Epub 2012 Jun 4.
9
R/BHC: fast Bayesian hierarchical clustering for microarray data.
BMC Bioinformatics. 2009 Aug 6;10:242. doi: 10.1186/1471-2105-10-242.

本文引用的文献

2
Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.
Cell. 2014 Aug 14;158(4):929-944. doi: 10.1016/j.cell.2014.06.049. Epub 2014 Aug 7.
3
COPD: definition and phenotypes.
Clin Chest Med. 2014 Mar;35(1):1-6. doi: 10.1016/j.ccm.2013.10.010. Epub 2013 Dec 12.
4
Disentangling the heterogeneity of autism spectrum disorder through genetic findings.
Nat Rev Neurol. 2014 Feb;10(2):74-81. doi: 10.1038/nrneurol.2013.278. Epub 2014 Jan 28.
5
Bayesian Gaussian Copula Factor Models for Mixed Data.
J Am Stat Assoc. 2013 Jun 1;108(502):656-665. doi: 10.1080/01621459.2012.762328.
6
Bayesian consensus clustering.
Bioinformatics. 2013 Oct 15;29(20):2610-6. doi: 10.1093/bioinformatics/btt425. Epub 2013 Aug 28.
7
Comprehensive molecular portraits of human breast tumours.
Nature. 2012 Oct 4;490(7418):61-70. doi: 10.1038/nature11412. Epub 2012 Sep 23.
8
A whole brain fMRI atlas generated via spatially constrained spectral clustering.
Hum Brain Mapp. 2012 Aug;33(8):1914-28. doi: 10.1002/hbm.21333. Epub 2011 Jul 18.
9
High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics.
J Am Stat Assoc. 2008 Dec 1;103(484):1438-1456. doi: 10.1198/016214508000000869.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验