Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, The Netherlands.
BMC Bioinformatics. 2014 Jul 8;15:236. doi: 10.1186/1471-2105-15-236.
A number of statistical models has been proposed for studying the association between gene expression and copy number data in integrated analysis. The next step is to compare association patterns between different groups of samples.
We propose a method, named dSIM, to find differences in association between copy number and gene expression, when comparing two groups of samples. Firstly, we use ridge regression to correct for the baseline associations between copy number and gene expression. Secondly, the global test is applied to the corrected data in order to find differences in association patterns between two groups of samples. We show that dSIM detects differences even in small genomic regions in a simulation study. We also apply dSIM to two publicly available breast cancer datasets and identify chromosome arms where copy number led gene expression regulation differs between positive and negative estrogen receptor samples. In spite of differing genomic coverage, some selected arms are identified in both datasets.
We developed a flexible and robust method for studying association differences between two groups of samples while integrating genomic data from different platforms. dSIM can be used with most types of microarray/sequencing data, including methylation and microRNA expression. The method is implemented in R and will be made part of the BioConductor package SIM.
已经提出了许多统计模型来研究整合分析中基因表达与拷贝数数据之间的关联。下一步是比较不同样本组之间的关联模式。
我们提出了一种名为 dSIM 的方法,用于在比较两组样本时发现拷贝数和基因表达之间关联的差异。首先,我们使用岭回归来校正拷贝数和基因表达之间的基线关联。其次,应用全局检验来对校正后的数据进行检验,以找到两组样本之间关联模式的差异。我们的模拟研究表明,即使在小的基因组区域,dSIM 也能检测到差异。我们还将 dSIM 应用于两个公开的乳腺癌数据集,并确定了雌激素受体阳性和阴性样本之间拷贝数导致基因表达调控差异的染色体臂。尽管基因组覆盖范围不同,但在两个数据集都鉴定出了一些选定的臂。
我们开发了一种灵活而稳健的方法,用于研究整合来自不同平台的基因组数据时两组样本之间的关联差异。dSIM 可用于大多数类型的微阵列/测序数据,包括甲基化和 microRNA 表达。该方法已在 R 中实现,并将成为 BioConductor 包 SIM 的一部分。