用于分析多个组学图谱时比较两组样本的检验方法。

A test for comparing two groups of samples when analyzing multiple omics profiles.

机构信息

Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.

出版信息

BMC Bioinformatics. 2014 Jul 8;15:236. doi: 10.1186/1471-2105-15-236.

DOI:10.1186/1471-2105-15-236

PMID:25004928

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4227098/

Abstract

BACKGROUND

A number of statistical models has been proposed for studying the association between gene expression and copy number data in integrated analysis. The next step is to compare association patterns between different groups of samples.

RESULTS

We propose a method, named dSIM, to find differences in association between copy number and gene expression, when comparing two groups of samples. Firstly, we use ridge regression to correct for the baseline associations between copy number and gene expression. Secondly, the global test is applied to the corrected data in order to find differences in association patterns between two groups of samples. We show that dSIM detects differences even in small genomic regions in a simulation study. We also apply dSIM to two publicly available breast cancer datasets and identify chromosome arms where copy number led gene expression regulation differs between positive and negative estrogen receptor samples. In spite of differing genomic coverage, some selected arms are identified in both datasets.

CONCLUSION

We developed a flexible and robust method for studying association differences between two groups of samples while integrating genomic data from different platforms. dSIM can be used with most types of microarray/sequencing data, including methylation and microRNA expression. The method is implemented in R and will be made part of the BioConductor package SIM.

摘要

背景

已经提出了许多统计模型来研究整合分析中基因表达与拷贝数数据之间的关联。下一步是比较不同样本组之间的关联模式。

结果

我们提出了一种名为 dSIM 的方法，用于在比较两组样本时发现拷贝数和基因表达之间关联的差异。首先，我们使用岭回归来校正拷贝数和基因表达之间的基线关联。其次，应用全局检验来对校正后的数据进行检验，以找到两组样本之间关联模式的差异。我们的模拟研究表明，即使在小的基因组区域，dSIM 也能检测到差异。我们还将 dSIM 应用于两个公开的乳腺癌数据集，并确定了雌激素受体阳性和阴性样本之间拷贝数导致基因表达调控差异的染色体臂。尽管基因组覆盖范围不同，但在两个数据集都鉴定出了一些选定的臂。