Institute of Molecular Bioimaging and Physiology of the Italian National Research Council (IBFM-CNR), Milan, Italy, and Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy,
Institute of Molecular Bioimaging and Physiology of the Italian National Research Council (IBFM-CNR), Milan, Italy.
Front Biosci (Landmark Ed). 2017 Jun 1;22(10):1697-1712. doi: 10.2741/4566.
Cancer heterogeneity represents a major hurdle in the development of effective theranostic strategies, as it prevents to devise unique and maximally efficient diagnostic, prognostic and therapeutic procedures even for patients affected by the same tumor type. Computational techniques can nowadays leverage the huge and ever increasing amount of (epi)genomic data to tackle this problem, therefore providing new and valuable instruments for decision support to biologists and pathologists, in the broad sphere of precision medicine. In this context, we here introduce a novel cancer subtype classifier from gene expression data and we apply it to two different Breast Cancer datasets, from TCGA and GEO repositories. The classifier is based on Support Vector Machines and relies on the information about the relevant pathways involved in breast cancer development to reduce the huge variable space. Among the main results, we show that the classifier accuracy is preserved at excellent values even when the variable space is reduced by a 20-fold, hence providing a precious tool for cancer patient profiling even in case of limited experimental resources.
癌症异质性是开发有效治疗策略的主要障碍,因为即使对于患有相同肿瘤类型的患者,它也阻止了设计独特和最高效的诊断、预后和治疗程序。如今,计算技术可以利用大量(表观)基因组数据来解决这个问题,从而为生物学家和病理学家在精准医学的广泛领域提供新的有价值的决策支持工具。在这种情况下,我们在这里从基因表达数据中引入了一种新的癌症亚型分类器,并将其应用于来自 TCGA 和 GEO 存储库的两个不同的乳腺癌数据集。该分类器基于支持向量机,并依赖于与乳腺癌发展相关的相关途径的信息来缩小巨大的变量空间。在主要结果中,我们表明,即使在变量空间减少 20 倍的情况下,分类器的准确性也能保持在极佳的值,从而为癌症患者分析提供了宝贵的工具,即使在实验资源有限的情况下也是如此。