多源数据的整合与正则化主成分分析

Integrative and regularized principal component analysis of multiple sources of data.

作者信息

Liu Binghui, Shen Xiaotong, Pan Wei

机构信息

School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, Jilin Province, China.

School of Statistics, University of Minnesota, 224 Church St. S.E., Minneapolis, 55455, MN, U.S.A.

出版信息

Stat Med. 2016 Jun 15;35(13):2235-50. doi: 10.1002/sim.6866. Epub 2016 Jan 12.

DOI:10.1002/sim.6866

PMID:26756854

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4853304/

Abstract

Integration of data of disparate types has become increasingly important to enhancing the power for new discoveries by combining complementary strengths of multiple types of data. One application is to uncover tumor subtypes in human cancer research in which multiple types of genomic data are integrated, including gene expression, DNA copy number, and DNA methylation data. In spite of their successes, existing approaches based on joint latent variable models require stringent distributional assumptions and may suffer from unbalanced scales (or units) of different types of data and non-scalability of the corresponding algorithms. In this paper, we propose an alternative based on integrative and regularized principal component analysis, which is distribution-free, computationally efficient, and robust against unbalanced scales. The new method performs dimension reduction simultaneously on multiple types of data, seeking data-adaptive sparsity and scaling. As a result, in addition to feature selection for each type of data, integrative clustering is achieved. Numerically, the proposed method compares favorably against its competitors in terms of accuracy (in identifying hidden clusters), computational efficiency, and robustness against unbalanced scales. In particular, compared with a popular method, the new method was competitive in identifying tumor subtypes associated with distinct patient survival patterns when applied to a combined analysis of DNA copy number, mRNA expression, and DNA methylation data in a glioblastoma multiforme study. Copyright © 2016 John Wiley & Sons, Ltd.

摘要

整合不同类型的数据对于通过结合多种数据类型的互补优势来增强新发现的能力变得越来越重要。一个应用是在人类癌症研究中揭示肿瘤亚型，其中整合了多种类型的基因组数据，包括基因表达、DNA拷贝数和DNA甲基化数据。尽管现有基于联合潜在变量模型的方法取得了成功，但它们需要严格的分布假设，并且可能受到不同类型数据的不平衡尺度（或单位）以及相应算法不可扩展性的影响。在本文中，我们提出了一种基于整合和正则化主成分分析的替代方法，该方法无分布假设、计算效率高且对不平衡尺度具有鲁棒性。新方法同时对多种类型的数据进行降维，寻求数据自适应的稀疏性和尺度。结果，除了对每种类型的数据进行特征选择外，还实现了整合聚类。在数值上，所提出的方法在准确性（识别隐藏聚类）、计算效率和对不平衡尺度的鲁棒性方面优于其竞争对手。特别是，与一种流行方法相比，新方法在应用于多形性胶质母细胞瘤研究中的DNA拷贝数、mRNA表达和DNA甲基化数据的联合分析时，在识别与不同患者生存模式相关的肿瘤亚型方面具有竞争力。版权所有© 2016约翰威立父子有限公司。

相似文献

Integrative and regularized principal component analysis of multiple sources of data.

Stat Med. 2016 Jun 15;35(13):2235-50. doi: 10.1002/sim.6866. Epub 2016 Jan 12.

Nonlinear Joint Latent Variable Models and Integrative Tumor Subtype Discovery.

Stat Anal Data Min. 2016 Apr;9(2):106-116. doi: 10.1002/sam.11306. Epub 2016 Mar 28.

Integrative clustering of multi-level 'omic data based on non-negative matrix factorization algorithm.

PLoS One. 2017 May 1;12(5):e0176278. doi: 10.1371/journal.pone.0176278. eCollection 2017.

Integrative Analysis of Multi-Omics Data Based on Blockwise Sparse Principal Components.

Int J Mol Sci. 2020 Nov 2;21(21):8202. doi: 10.3390/ijms21218202.

Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis.

Bioinformatics. 2009 Nov 15;25(22):2906-12. doi: 10.1093/bioinformatics/btp543. Epub 2009 Sep 16.

Latent Feature Decompositions for Integrative Analysis of Multi-Platform Genomic Data.

IEEE/ACM Trans Comput Biol Bioinform. 2014 Nov-Dec;11(6):984-94. doi: 10.1109/TCBB.2014.2325035. Epub 2014 May 19.

Structural learning and integrative decomposition of multi-view data.

Biometrics. 2019 Dec;75(4):1121-1132. doi: 10.1111/biom.13108. Epub 2019 Sep 15.

Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma.

BMC Bioinformatics. 2017 Oct 3;18(Suppl 11):381. doi: 10.1186/s12859-017-1798-2.

Biological pathway selection through nonlinear dimension reduction.

Biostatistics. 2011 Jul;12(3):429-44. doi: 10.1093/biostatistics/kxq081. Epub 2011 Jan 20.

Integrative genome-wide analysis reveals a robust genomic glioblastoma signature associated with copy number driving changes in gene expression.

Genes Chromosomes Cancer. 2009 Jan;48(1):55-68. doi: 10.1002/gcc.20618.

引用本文的文献

Fast Fusion Clustering via Double Random Projection.

Entropy (Basel). 2024 Apr 28;26(5):376. doi: 10.3390/e26050376.

Robust integrative biclustering for multi-view data.

Stat Methods Med Res. 2022 Nov;31(11):2201-2216. doi: 10.1177/09622802221122427. Epub 2022 Sep 13.

Evaluation and comparison of multi-omics data integration methods for cancer subtyping.

PLoS Comput Biol. 2021 Aug 12;17(8):e1009224. doi: 10.1371/journal.pcbi.1009224. eCollection 2021 Aug.

A New Algorithm and Theory for Penalized Regression-based Clustering.

J Mach Learn Res. 2016;17.

Integrative factorization of bidimensionally linked matrices.

Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.

Quantifying heterogeneity of expression data based on principal components.

Bioinformatics. 2019 Feb 15;35(4):553-559. doi: 10.1093/bioinformatics/bty671.

Uncovering Large-Scale Conformational Change in Molecular Dynamics without Prior Knowledge.

J Chem Theory Comput. 2016 Dec 13;12(12):6130-6146. doi: 10.1021/acs.jctc.6b00757. Epub 2016 Nov 10.

本文引用的文献

Matrix factorization methods for integrative cancer genomics.

Methods Mol Biol. 2014;1176:229-42. doi: 10.1007/978-1-4939-0992-6_19.

SPARSE INTEGRATIVE CLUSTERING OF MULTIPLE OMICS DATA SETS.

Ann Appl Stat. 2013 Apr 9;7(1):269-294. doi: 10.1214/12-AOAS578.

JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES.

Ann Appl Stat. 2013 Mar 1;7(1):523-542. doi: 10.1214/12-AOAS597.

Pattern discovery and cancer gene identification in integrated cancer genomic data.

Proc Natl Acad Sci U S A. 2013 Mar 12;110(11):4245-50. doi: 10.1073/pnas.1208949110. Epub 2013 Feb 21.

Merging multiple omics datasets in silico: statistical analyses and data interpretation.

Methods Mol Biol. 2013;985:459-70. doi: 10.1007/978-1-62703-299-5_23.

DNA methylation profiles of long- and short-term glioblastoma survivors.

Epigenetics. 2013 Feb;8(2):149-56. doi: 10.4161/epi.23398. Epub 2013 Jan 4.

Discovery of multi-dimensional modules by integrative analysis of cancer genomic data.

Nucleic Acids Res. 2012 Oct;40(19):9379-91. doi: 10.1093/nar/gks725. Epub 2012 Aug 8.

Identifying multi-layer gene regulatory modules from multi-dimensional genomic data.

Bioinformatics. 2012 Oct 1;28(19):2458-66. doi: 10.1093/bioinformatics/bts476. Epub 2012 Aug 3.

Integrative subtype discovery in glioblastoma using iCluster.

PLoS One. 2012;7(4):e35236. doi: 10.1371/journal.pone.0035236. Epub 2012 Apr 23.

Radiogenomic mapping of edema/cellular invasion MRI-phenotypes in glioblastoma multiforme.

PLoS One. 2011;6(10):e25451. doi: 10.1371/journal.pone.0025451. Epub 2011 Oct 5.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

多源数据的整合与正则化主成分分析

Integrative and regularized principal component analysis of multiple sources of data.

作者信息

Liu Binghui, Shen Xiaotong, Pan Wei

机构信息

School of Mathematics and Statistics, Northeast Normal University, Changchun, 130024, Jilin Province, China.

School of Statistics, University of Minnesota, 224 Church St. S.E., Minneapolis, 55455, MN, U.S.A.

出版信息

Stat Med. 2016 Jun 15;35(13):2235-50. doi: 10.1002/sim.6866. Epub 2016 Jan 12.

DOI:10.1002/sim.6866

PMID:26756854

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4853304/

Abstract

摘要

多源数据的整合与正则化主成分分析

Integrative and regularized principal component analysis of multiple sources of data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

多源数据的整合与正则化主成分分析

Integrative and regularized principal component analysis of multiple sources of data.

作者信息

机构信息

出版信息