Suppr超能文献

非线性联合潜在变量模型与整合肿瘤亚型发现

Nonlinear Joint Latent Variable Models and Integrative Tumor Subtype Discovery.

作者信息

Liu Binghui, Shen Xiaotong, Pan Wei

机构信息

School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024 Jilin Province, China.

School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA.

出版信息

Stat Anal Data Min. 2016 Apr;9(2):106-116. doi: 10.1002/sam.11306. Epub 2016 Mar 28.

Abstract

Integrative analysis has been used to identify clusters by integrating data of disparate types, such as deoxyribonucleic acid (DNA) copy number alterations and DNA methylation changes for discovering novel subtypes of tumors. Most existing integrative analysis methods are based on joint latent variable models, which are generally divided into two classes: joint factor analysis and joint mixture modeling, with continuous and discrete parameterizations of the latent variables respectively. Despite recent progresses, many issues remain. In particular, existing integration methods based on joint factor analysis may be inadequate to model multiple clusters due to the unimodality of the assumed Gaussian distribution, while those based on joint mixture modeling may not have the ability for dimension reduction and/or feature selection. In this paper, we employ a nonlinear joint latent variable model to allow for flexible modeling that can account for multiple clusters as well as conduct dimension reduction and feature selection. We propose a method, called integrative and regularized generative topographic mapping (irGTM), to perform simultaneous dimension reduction across multiple types of data while achieving feature selection separately for each data type. Simulations are performed to examine the operating characteristics of the methods, in which the proposed method compares favorably against the popular iCluster that is based on a linear joint latent variable model. Finally, a glioblastoma multiforme (GBM) dataset is examined.

摘要

整合分析已被用于通过整合不同类型的数据(如脱氧核糖核酸(DNA)拷贝数改变和DNA甲基化变化)来识别聚类,以发现肿瘤的新亚型。大多数现有的整合分析方法基于联合潜在变量模型,这些模型通常分为两类:联合因子分析和联合混合建模,潜在变量分别具有连续和离散的参数化。尽管最近取得了进展,但仍存在许多问题。特别是,基于联合因子分析的现有整合方法可能由于假定的高斯分布的单峰性而不足以对多个聚类进行建模,而基于联合混合建模的方法可能没有降维和/或特征选择的能力。在本文中,我们采用非线性联合潜在变量模型以实现灵活建模,该模型可以考虑多个聚类,并进行降维和特征选择。我们提出了一种称为整合正则化生成地形映射(irGTM)的方法,以在跨多种类型数据进行同时降维的同时,为每种数据类型分别实现特征选择。进行了模拟以检验这些方法的操作特性,其中所提出的方法与基于线性联合潜在变量模型的流行iCluster相比具有优势。最后,研究了一个多形性胶质母细胞瘤(GBM)数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6433/5761081/3717002d3f88/nihms906924f1.jpg

相似文献

8
Latent Feature Decompositions for Integrative Analysis of Multi-Platform Genomic Data.用于多平台基因组数据综合分析的潜在特征分解
IEEE/ACM Trans Comput Biol Bioinform. 2014 Nov-Dec;11(6):984-94. doi: 10.1109/TCBB.2014.2325035. Epub 2014 May 19.
9

引用本文的文献

本文引用的文献

3

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验