Yan Koon-Kiu, Wang Daifeng, Rozowsky Joel, Zheng Henry, Cheng Chao, Gerstein Mark
Genome Biol. 2014 Aug 28;15(8):R100. doi: 10.1186/gb-2014-15-8-r100.
Increasingly, high-dimensional genomics data are becoming available for many organisms.Here, we develop OrthoClust for simultaneously clustering data across multiple species. OrthoClust is a computational framework that integrates the co-association networks of individual species by utilizing the orthology relationships of genes between species. It outputs optimized modules that are fundamentally cross-species, which can either be conserved or species-specific. We demonstrate the application of OrthoClust using the RNA-Seq expression profiles of Caenorhabditis elegans and Drosophila melanogaster from the modENCODE consortium. A potential application of cross-species modules is to infer putative analogous functions of uncharacterized elements like non-coding RNAs based on guilt-by-association.
越来越多的高维基因组数据可用于许多生物。在此,我们开发了OrthoClust用于跨多个物种同时对数据进行聚类。OrthoClust是一个计算框架,它通过利用物种间基因的直系同源关系来整合单个物种的共关联网络。它输出从根本上说是跨物种的优化模块,这些模块既可以是保守的,也可以是物种特异性的。我们使用来自modENCODE联盟的秀丽隐杆线虫和黑腹果蝇的RNA-Seq表达谱展示了OrthoClust的应用。跨物种模块的一个潜在应用是基于关联有罪来推断未表征元件(如非编码RNA)的假定类似功能。