Yli-Hietanen Jari, Ylipää Antti, Yli-Harja Olli
Department of Signal Processing, Tampere University of Technology, P. O. Box 553, Tampere, 33101, Finland.
Chin J Cancer. 2015 Apr 11;34(10):423-6. doi: 10.1186/s40880-015-0008-8.
We examine the role of big data and machine learning in cancer research. We describe an example in cancer research where gene-level data from The Cancer Genome Atlas (TCGA) consortium is interpreted using a pathway-level model. As the complexity of computational models increases, their sample requirements grow exponentially. This growth stems from the fact that the number of combinations of variables grows exponentially as the number of variables increases. Thus, a large sample size is needed. The number of variables in a computational model can be reduced by incorporating biological knowledge. One particularly successful way of doing this is by using available gene regulatory, signaling, metabolic, or context-specific pathway information. We conclude that the incorporation of existing biological knowledge is essential for the progress in using big data for cancer research.
我们研究了大数据和机器学习在癌症研究中的作用。我们描述了一个癌症研究中的例子,其中使用通路水平模型解释了来自癌症基因组图谱(TCGA)联盟的基因水平数据。随着计算模型复杂性的增加,其样本需求呈指数增长。这种增长源于这样一个事实,即变量组合的数量随着变量数量的增加而呈指数增长。因此,需要大量样本。通过纳入生物学知识,可以减少计算模型中的变量数量。一种特别成功的方法是使用可用的基因调控、信号传导、代谢或特定背景的通路信息。我们得出结论,纳入现有生物学知识对于利用大数据进行癌症研究的进展至关重要。