Antosh Michael, Fox David, Cooper Leon N, Neretti Nicola
Department of Physics, Brown University, Providence, RI 02912, USA.
J Comput Biol. 2013 Jun;20(6):433-43. doi: 10.1089/cmb.2013.0017. Epub 2013 May 15.
Because a very large number of gene expression data sets are currently publicly available, comparisons across experiments between different laboratories have become a common task. However, most existing methods of comparing gene expression data sets require setting arbitrary cutoffs (e.g., for statistical significance or fold change), which could select genes according to different criteria because of differences in experimental protocols and statistical analysis in different data sets. A new method is proposed for comparing expression profiles across experiments by using the rank of genes in the different datasets. We introduce a maximization statistic, which can be calculated recursively and allows for efficient searches on a large space (paths on a grid). We apply our method to both simulated and real datasets and show that it outperforms other existing rank-based algorithms. CORaL is a novel method for comparison of gene expression data that performs well on simulated and real data. It has the potential for wide and effective use in computational biology.
由于目前有大量的基因表达数据集可公开获取,不同实验室之间跨实验的比较已成为一项常见任务。然而,大多数现有的比较基因表达数据集的方法需要设置任意阈值(例如,用于统计显著性或倍数变化),由于不同数据集中实验方案和统计分析的差异,这些方法可能会根据不同标准选择基因。本文提出了一种通过使用不同数据集中基因的排名来比较跨实验表达谱的新方法。我们引入了一个最大化统计量,它可以递归计算,并允许在大空间(网格上的路径)中进行高效搜索。我们将我们的方法应用于模拟数据集和真实数据集,并表明它优于其他现有的基于排名的算法。CORaL是一种用于比较基因表达数据的新方法,在模拟数据和真实数据上都表现良好。它有在计算生物学中广泛有效应用的潜力。