Caravagna Giulio, Graudenzi Alex, Ramazzotti Daniele, Sanz-Pamplona Rebeca, De Sano Luca, Mauri Giancarlo, Moreno Victor, Antoniotti Marco, Mishra Bud
Department of Informatics, Systems and Communication, University of Milan-Bicocca, 20126 Milan, Italy; School of Informatics, University of Edinburgh, Edinburgh EH8 9YL, United Kingdom;
Department of Informatics, Systems and Communication, University of Milan-Bicocca, 20126 Milan, Italy; Institute of Molecular Bioimaging and Physiology, Italian National Research Council, 93-I-20090 Milan, Italy;
Proc Natl Acad Sci U S A. 2016 Jul 12;113(28):E4025-34. doi: 10.1073/pnas.1520213113. Epub 2016 Jun 28.
The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the "selective advantage" relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc's ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses.
癌症固有的基因组进化直接关系到人们重新关注大量的下一代测序数据以及机器学习,以便推断关于(表观)基因组事件在癌症发生和发展过程中是如何编排的解释模型。然而,尽管越来越容易获得多种额外的组学数据,但这一探索因各种理论和技术障碍而受挫,这些障碍大多源于该疾病的显著异质性。在本文中,我们基于我们最近关于癌症进展中驱动突变之间“选择性优势”关系的研究,并研究其在群体水平建模问题中的适用性。在这里,我们介绍PiCnIc(癌症推断管道),这是一个通用、模块化且可定制的管道,用于从横断面测序的癌症基因组中提取整体水平的进展模型。该管道具有许多转化意义,因为它结合了用于样本分层、驱动选择、识别适应性等效排他性改变以及进展模型推断的最先进技术。我们展示了PiCnIc重现当前许多关于结直肠癌进展的知识以及提出新的可通过实验验证的假设的能力。