Cohn-Alperovich Dalia, Rabner Alona, Kifer Ilona, Mandel-Gutfreund Yael, Yakhini Zohar
Computer Science Department, Technion - Israel Institute of Technology, Haifa 3200003, Israel, Microsoft Research and Development Center, Haifa and Herzeliya, Israel.
Department of Biology, Technion - Israel Institute of Technology, Haifa 3200003, Israel.
Bioinformatics. 2016 Sep 1;32(17):i464-i472. doi: 10.1093/bioinformatics/btw435.
It is often the case in biological measurement data that results are given as a ranked list of quantities-for example, differential expression (DE) of genes as inferred from microarrays or RNA-seq. Recent years brought considerable progress in statistical tools for enrichment analysis in ranked lists. Several tools are now available that allow users to break the fixed set paradigm in assessing statistical enrichment of sets of genes. Continuing with the example, these tools identify factors that may be associated with measured differential expression. A drawback of existing tools is their focus on identifying single factors associated with the observed or measured ranks, failing to address relationships between these factors. For example, a scenario in which genes targeted by multiple miRNAs play a central role in the DE signal but the effect of each single miRNA is too subtle to be detected, as shown in our results.
We propose statistical and algorithmic approaches for selecting a sub-collection of factors that can be aggregated into one ranked list that is heuristically most associated with an input ranked list (pivot). We examine performance on simulated data and apply our approach to cancer datasets. We find small sub-collections of miRNA that are statistically associated with gene DE in several types of cancer, suggesting miRNA cooperativity in driving disease related processes. Many of our findings are consistent with known roles of miRNAs in cancer, while others suggest previously unknown roles for certain miRNAs.
Code and instructions for our algorithmic framework, MULSEA, are in: https://github.com/YakhiniGroup/MULSEAContact:dalia.cohn@gmail.com
Supplementary data are available at Bioinformatics online.
在生物测量数据中,结果通常以数量的排名列表形式给出,例如,从微阵列或RNA测序推断出的基因差异表达(DE)。近年来,用于排名列表富集分析的统计工具取得了显著进展。现在有几种工具可供用户在评估基因集的统计富集时打破固定集范式。继续以这个例子来说,这些工具识别可能与测量的差异表达相关的因素。现有工具的一个缺点是它们专注于识别与观察到的或测量的排名相关的单个因素,而没有解决这些因素之间的关系。例如,正如我们的结果所示,一种情况是多个miRNA靶向的基因在DE信号中起核心作用,但每个单个miRNA的作用过于微妙而无法检测到。
我们提出了统计和算法方法,用于选择一组因素的子集合,这些因素可以聚合为一个与输入排名列表(枢轴)启发式最相关的排名列表。我们检查了在模拟数据上的性能,并将我们的方法应用于癌症数据集。我们发现了在几种类型的癌症中与基因DE统计相关的小miRNA子集合,这表明miRNA在驱动疾病相关过程中存在协同作用。我们的许多发现与miRNA在癌症中的已知作用一致,而其他一些发现则表明某些miRNA有以前未知的作用。
我们的算法框架MULSEA的代码和说明位于:https://github.com/YakhiniGroup/MULSEAContact:dalia.cohn@gmail.com
补充数据可在《生物信息学》在线获取。