Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano MI, Italia.
BMC Bioinformatics. 2012;13 Suppl 14(Suppl 14):S3. doi: 10.1186/1471-2105-13-S14-S3. Epub 2012 Sep 7.
Co-expression based Cancer Modules (CMs) are sets of genes that act in concert to carry out specific functions in different cancer types, and are constructed by exploiting gene expression profiles related to specific clinical conditions or expression signatures associated to specific processes altered in cancer. Unfortunately, genes involved in cancer are not always detectable using only expression signatures or co-expressed sets of genes, and in principle other types of functional interactions should be exploited to obtain a comprehensive picture of the molecular mechanisms underlying the onset and progression of cancer.
We propose a novel semi-supervised method to rank genes with respect to CMs using networks constructed from different sources of functional information, not limited to gene expression data. It exploits on the one hand local learning strategies through score functions that extend the guilt-by-association approach, and on the other hand global learning strategies through graph kernels embedded in the score functions, able to take into account the overall topology of the network. The proposed kernelized score functions compare favorably with other state-of-the-art semi-supervised machine learning methods for gene ranking in biological networks and scales well with the number of genes, thus allowing fast processing of very large gene networks.
The modular nature of kernelized score functions provides an algorithmic scheme from which different gene ranking algorithms can be derived, and the results show that using integrated functional networks we can successfully predict CMs defined mainly through expression signatures obtained from gene expression data profiling. A preliminary analysis of top ranked "false positive" genes shows that our approach could be in perspective applied to discover novel genes involved in the onset and progression of tumors related to specific CMs.
基于共表达的癌症模块(CM)是一组协同作用的基因,在不同的癌症类型中执行特定功能,通过利用与特定临床条件相关的基因表达谱或与癌症中改变的特定过程相关的表达特征来构建。不幸的是,仅使用表达特征或共表达基因集并不总能检测到参与癌症的基因,并且原则上应该利用其他类型的功能相互作用来获得癌症发生和进展的分子机制的全面图景。
我们提出了一种使用来自不同功能信息源(不限于基因表达数据)构建的网络对 CM 进行基因排序的新半监督方法。它一方面利用通过扩展关联方法的评分函数的局部学习策略,另一方面利用嵌入在评分函数中的图核的全局学习策略,能够考虑到网络的整体拓扑结构。所提出的核化评分函数在生物网络中的基因排序方面优于其他最先进的半监督机器学习方法,并且与基因数量很好地扩展,从而允许快速处理非常大的基因网络。
核化评分函数的模块化性质提供了一种算法方案,可以从中衍生出不同的基因排序算法,结果表明,使用集成功能网络,我们可以成功预测主要通过基因表达数据谱分析获得的表达特征定义的 CM。对排名靠前的“假阳性”基因的初步分析表明,我们的方法可以应用于发现与特定 CM 相关的肿瘤发生和进展中涉及的新基因。