两阶段关联成分分析用于联合分解多个具有生物学相关性的数据集。

Two-stage linked component analysis for joint decomposition of multiple biologically related data sets.

机构信息

Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA.

Department of Neuroscience, Johns Hopkins University, Baltimore, MD, 21205, USA.

出版信息

Biostatistics. 2022 Oct 14;23(4):1200-1217. doi: 10.1093/biostatistics/kxac005.

DOI:10.1093/biostatistics/kxac005

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9566367/

Abstract

Integrative analysis of multiple data sets has the potential of fully leveraging the vast amount of high throughput biological data being generated. In particular such analysis will be powerful in making inference from publicly available collections of genetic, transcriptomic and epigenetic data sets which are designed to study shared biological processes, but which vary in their target measurements, biological variation, unwanted noise, and batch variation. Thus, methods that enable the joint analysis of multiple data sets are needed to gain insights into shared biological processes that would otherwise be hidden by unwanted intra-data set variation. Here, we propose a method called two-stage linked component analysis (2s-LCA) to jointly decompose multiple biologically related experimental data sets with biological and technological relationships that can be structured into the decomposition. The consistency of the proposed method is established and its empirical performance is evaluated via simulation studies. We apply 2s-LCA to jointly analyze four data sets focused on human brain development and identify meaningful patterns of gene expression in human neurogenesis that have shared structure across these data sets.

摘要

整合多个数据集的分析有可能充分利用大量生成的高通量生物学数据。特别是，这种分析将非常强大，可以从旨在研究共享生物学过程的公开可用的遗传、转录组和表观遗传数据集集合中进行推断，但这些数据集在目标测量、生物变异、不需要的噪声和批次变异方面存在差异。因此，需要能够联合分析多个数据集的方法来深入了解共享的生物学过程，否则这些过程会被不需要的数据集中的变异所掩盖。在这里，我们提出了一种称为两阶段关联成分分析（2s-LCA）的方法，用于联合分解具有生物学和技术关系的多个生物学相关实验数据集，这些关系可以结构化到分解中。通过模拟研究，建立了所提出方法的一致性，并评估了其经验性能。我们将 2s-LCA 应用于联合分析四个专注于人类大脑发育的数据集，并在这些数据集中识别出人类神经发生中具有共享结构的有意义的基因表达模式。

相似文献

1

Two-stage linked component analysis for joint decomposition of multiple biologically related data sets.两阶段关联成分分析用于联合分解多个具有生物学相关性的数据集。

Biostatistics. 2022 Oct 14;23(4):1200-1217. doi: 10.1093/biostatistics/kxac005.

2

Integrative omics analysis. A study based on Plasmodium falciparum mRNA and protein data.整合组学分析。一项基于恶性疟原虫mRNA和蛋白质数据的研究。

BMC Syst Biol. 2014;8 Suppl 2(Suppl 2):S4. doi: 10.1186/1752-0509-8-S2-S4. Epub 2014 Mar 13.

3

Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.第1部分. 多种空气污染成分影响的统计学习方法

Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.

4

Integrative analysis of independent transcriptome data for rare diseases.罕见病独立转录组数据的整合分析

Methods. 2014 Oct 1;69(3):315-25. doi: 10.1016/j.ymeth.2014.06.003. Epub 2014 Jun 27.

5

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

6

Risk-conscious correction of batch effects: maximising information extraction from high-throughput genomic datasets.基于风险意识的批次效应校正：从高通量基因组数据集中最大化信息提取。

BMC Bioinformatics. 2016 Sep 1;17(1):332. doi: 10.1186/s12859-016-1212-5.

7

Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets.多组大规模两样本表达数据集的一致整合基因集富集分析。

BMC Genomics. 2014;15 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2164-15-S1-S6. Epub 2014 Jan 24.

8

Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets.独立主成分分析在大型生物数据集的生物学有意义的降维中的应用。

BMC Bioinformatics. 2012 Feb 3;13:24. doi: 10.1186/1471-2105-13-24.

9

In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development.基于计算的通路激活网络分解分析（iPANDA）作为一种生物标志物开发方法。

Nat Commun. 2016 Nov 16;7:13427. doi: 10.1038/ncomms13427.

10

Data-driven human transcriptomic modules determined by independent component analysis.基于独立成分分析的人类转录组模块的数据分析。

BMC Bioinformatics. 2018 Sep 17;19(1):327. doi: 10.1186/s12859-018-2338-4.

引用本文的文献

1

CellCover Defines Marker Gene Panels Capturing Developmental Progression in Neocortical Neural Stem Cell Identity.细胞覆盖物定义了捕获新皮质神经干细胞身份发育进程的标记基因面板。

bioRxiv. 2025 Apr 9:2023.04.06.535943. doi: 10.1101/2023.04.06.535943.

2

The use of prognostic models in allogeneic transplants: a perspective guide for clinicians and investigators.预测模型在异基因移植中的应用：临床医生和研究人员的视角指南。

Blood. 2023 May 4;141(18):2173-2186. doi: 10.1182/blood.2022017999.

3

Interpretive JIVE: Connections with CCA and an application to brain connectivity.解释性JIVE：与CCA的联系及其在脑连接性中的应用。

Front Neurosci. 2022 Oct 14;16:969510. doi: 10.3389/fnins.2022.969510. eCollection 2022.

本文引用的文献

1

BIDIMENSIONAL LINKED MATRIX FACTORIZATION FOR PAN-OMICS PAN-CANCER ANALYSIS.用于泛组学全癌分析的二维链接矩阵分解

Ann Appl Stat. 2022 Mar;16(1):193-215. doi: 10.1214/21-AOAS1495. Epub 2022 Mar 28.

2

Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data.混合多视图数据的集成广义凸聚类优化与特征选择

J Mach Learn Res. 2021 Jan;22.

3

gEAR: Gene Expression Analysis Resource portal for community-driven, multi-omic data exploration.gEAR：用于社区驱动的多组学数据探索的基因表达分析资源门户。

Nat Methods. 2021 Aug;18(8):843-844. doi: 10.1038/s41592-021-01200-9.

4

Semiparametric partial common principal component analysis for covariance matrices.协方差矩阵的半参数部分共同主成分分析

Biometrics. 2021 Dec;77(4):1175-1186. doi: 10.1111/biom.13369. Epub 2020 Oct 10.

5

Sparse multiple co-Inertia analysis with application to integrative analysis of multi -Omics data.稀疏多重共惯性分析及其在多组学数据综合分析中的应用。

BMC Bioinformatics. 2020 Apr 15;21(1):141. doi: 10.1186/s12859-020-3455-4.

6

projectR: an R/Bioconductor package for transfer learning via PCA, NMF, correlation and clustering.projectR：一个用于通过 PCA、NMF、相关性和聚类进行迁移学习的 R/Bioconductor 包。

Bioinformatics. 2020 Jun 1;36(11):3592-3593. doi: 10.1093/bioinformatics/btaa183.

7

Integrative factorization of bidimensionally linked matrices.二维关联矩阵的综合分解。

Biometrics. 2020 Mar;76(1):61-74. doi: 10.1111/biom.13141. Epub 2019 Nov 10.

8

A Review of Statistical Methods in Imaging Genetics.影像遗传学中的统计方法综述

Can J Stat. 2019 Mar;47(1):108-131. doi: 10.1002/cjs.11487. Epub 2019 Feb 25.

9

Structural learning and integrative decomposition of multi-view data.多视图数据的结构学习与整合分解

Biometrics. 2019 Dec;75(4):1121-1132. doi: 10.1111/biom.13108. Epub 2019 Sep 15.

10

Decomposing Cell Identity for Transfer Learning across Cellular Measurements, Platforms, Tissues, and Species.跨细胞测量、平台、组织和物种进行迁移学习的细胞身份分解。

Cell Syst. 2019 May 22;8(5):395-411.e8. doi: 10.1016/j.cels.2019.04.004.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验