The Judith and David Coffey Life Lab, Charles Perkins Centre, University of Sydney, Sydney, NSW, Australia.
School of Mathematics and Statistics, University of Sydney, Sydney, NSW, Australia.
Bioinformatics. 2019 Mar 1;35(5):823-829. doi: 10.1093/bioinformatics/bty698.
Genes act as a system and not in isolation. Thus, it is important to consider coordinated changes of gene expression rather than single genes when investigating biological phenomena such as the aetiology of cancer. We have developed an approach for quantifying how changes in the association between pairs of genes may inform the outcome of interest called Differential Correlation across Ranked Samples (DCARS). Modelling gene correlation across a continuous sample ranking does not require the dichotomisation of samples into two distinct classes and can identify differences in gene correlation across early, mid or late stages of the outcome of interest.
When we evaluated DCARS against the typical Fisher Z-transformation test for differential correlation, as well as a typical approach testing for interaction within a linear model, on real TCGA data, DCARS significantly ranked gene pairs containing known cancer genes more highly across several cancers. Similar results are found with our simulation study. DCARS was applied to 13 cancers datasets in TCGA, revealing several distinct relationships for which survival ranking was found to be associated with a change in correlation between genes. Furthermore, we demonstrated that DCARS can be used in conjunction with network analysis techniques to extract biological meaning from multi-layered and complex data.
DCARS R package and sample data are available at https://github.com/shazanfar/DCARS. Publicly available data from The Cancer Genome Atlas (TCGA) was used using the TCGABiolinks R package. Supplementary Files and DCARS R package is available at https://github.com/shazanfar/DCARS.
Supplementary data are available at Bioinformatics online.
基因不是孤立起作用的,而是作为一个系统发挥作用。因此,在研究癌症病因等生物学现象时,考虑基因表达的协调变化而不仅仅是单个基因是很重要的。我们开发了一种方法来量化基因对之间的关联变化如何提供感兴趣的结果信息,称为基于排序样本的差异相关(DCARS)。对连续样本排序的基因相关性进行建模不需要将样本分为两个不同的类别,并且可以识别在感兴趣结果的早期、中期或晚期阶段基因相关性的差异。
当我们将 DCARS 与典型的差异相关性 Fisher Z 变换检验以及线性模型中用于检验交互作用的典型方法在真实 TCGA 数据上进行评估时,DCARS 在几个癌症中显著地对包含已知癌症基因的基因对进行了更高的排名。我们的模拟研究也得到了类似的结果。DCARS 应用于 TCGA 的 13 个癌症数据集,揭示了几种不同的关系,其中生存排名与基因之间相关性的变化相关。此外,我们证明了 DCARS 可以与网络分析技术结合使用,从多层次和复杂的数据中提取生物学意义。
DCARS R 包和样本数据可在 https://github.com/shazanfar/DCARS 上获得。使用 TCGABiolinks R 包从癌症基因组图谱(TCGA)公共数据库中获取公开可用的数据。补充文件和 DCARS R 包可在 https://github.com/shazanfar/DCARS 上获得。
补充数据可在生物信息学在线获得。