Steinfeld Israel, Navon Roy, Ardigò Diego, Zavaroni Ivana, Yakhini Zohar
Agilent Laboratories, Tel Aviv, Israel.
Bioinformatics. 2008 Aug 15;24(16):i90-7. doi: 10.1093/bioinformatics/btn279.
Unsupervised class discovery in gene expression data relies on the statistical signals in the data to exclusively drive the results. It is often the case, however, that one is interested in constraining the search space to respect certain biological prior knowledge while still allowing a flexible search within these boundaries.
We develop an approach to semi-supervised class discovery. One component of our approach uses clinical sample information to constrain the search space and guide the class discovery process to yield biologically relevant partitions. A second component consists of using known biological annotation of genes to drive the search, seeking partitions that manifest strong differential expression in specific sets of genes. We develop efficient algorithmics for these tasks, implementing both approaches and combinations thereof. We show that our method is robust enough to detect known clinical parameters in accordance with expected clinical values. We also use our method to elucidate cardiovascular disease (CVD) putative risk factors.
MonoClaD (Monotone Class Discovery). See http:// bioinfo.cs.technion.ac.il/people/zohar/MonoClad/.
Supplementary data is available at http://bioinfo.cs.technion.ac.il/people/zohar/MonoClad/software. html
基因表达数据中的无监督类别发现完全依赖于数据中的统计信号来驱动结果。然而,通常情况下,人们希望在尊重某些生物学先验知识的同时限制搜索空间,并且仍能在这些边界内进行灵活搜索。
我们开发了一种半监督类别发现方法。我们方法的一个组成部分利用临床样本信息来限制搜索空间,并指导类别发现过程以产生生物学相关的划分。第二个组成部分是利用已知的基因生物学注释来驱动搜索,寻找在特定基因集中表现出强烈差异表达的划分。我们为这些任务开发了高效的算法,实现了这两种方法及其组合。我们表明,我们的方法足够稳健,能够根据预期临床值检测出已知临床参数。我们还使用我们的方法阐明心血管疾病(CVD)的潜在危险因素。
MonoClaD(单调类别发现)。见http://bioinfo.cs.technion.ac.il/people/zohar/MonoClad/。
补充数据可在http://bioinfo.cs.technion.ac.il/people/zohar/MonoClad/software.html获得。