Suppr超能文献

多元多向多源数据分析。

Multivariate multi-way analysis of multi-source data.

机构信息

Aalto University School of Science and Technology, Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, PO Box 15400, FI-00076 Aalto, Espoo, Finland.

出版信息

Bioinformatics. 2010 Jun 15;26(12):i391-8. doi: 10.1093/bioinformatics/btq174.

Abstract

MOTIVATION

Analysis of variance (ANOVA)-type methods are the default tool for the analysis of data with multiple covariates. These tools have been generalized to the multivariate analysis of high-throughput biological datasets, where the main challenge is the problem of small sample size and high dimensionality. However, the existing multi-way analysis methods are not designed for the currently increasingly important experiments where data is obtained from multiple sources. Common examples of such settings include integrated analysis of metabolic and gene expression profiles, or metabolic profiles from several tissues in our case, in a controlled multi-way experimental setup where disease status, medical treatment, gender and time-series are usual covariates.

RESULTS

We extend the applicability area of multivariate, multi-way ANOVA-type methods to multi-source cases by introducing a novel Bayesian model. The method is capable of finding covariate-related dependencies between the sources. It assumes the measurements consist of groups of similarly behaving variables, and estimates the multivariate covariate effects and their interaction effects for the discovered groups of variables. In particular, the method partitions the effects to those shared between the sources and to source-specific ones. The method is specifically designed for datasets with small sample sizes and high dimensionality. We apply the method to a lipidomics dataset from a lung cancer study with two-way experimental setup, where measurements from several tissues with mostly distinct lipids have been taken. The method is also directly applicable to gene expression and proteomics.

AVAILABILITY

An R-implementation is available at http://www.cis.hut.fi/projects/mi/software/multiWayCCA/.

摘要

动机

方差分析(ANOVA)类方法是分析具有多个协变量数据的默认工具。这些工具已被推广到高通量生物数据集的多元分析中,其中主要的挑战是小样本量和高维性的问题。然而,现有的多元分析方法并不是为目前越来越重要的实验设计的,这些实验的数据来自多个来源。这种设置的常见例子包括代谢和基因表达谱的综合分析,或者在我们的案例中,在一个受控的多向实验设置中,来自多个组织的代谢谱,其中疾病状态、医疗处理、性别和时间序列通常是协变量。

结果

我们通过引入一种新的贝叶斯模型,将多元、多向 ANOVA 类方法的应用领域扩展到多源情况。该方法能够发现源之间与协变量相关的依赖关系。它假设测量由类似行为的变量组组成,并估计发现的变量组的多元协变量效应及其相互作用效应。特别是,该方法将效应划分为源之间共享的和源特定的效应。该方法专门针对小样本量和高维数据集设计。我们将该方法应用于一个具有双向实验设置的肺癌研究中的脂质组学数据集,其中来自具有不同脂质的几个组织的测量值已经被采集。该方法也可直接应用于基因表达和蛋白质组学。

可用性

R 实现可在 http://www.cis.hut.fi/projects/mi/software/multiWayCCA/ 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eb0e/2881359/28161995a817/btq174f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验