Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
Proc Natl Acad Sci U S A. 2013 Jun 18;110(25):E2271-8. doi: 10.1073/pnas.1306909110. Epub 2013 Jun 6.
Finding regions of the genome that are significantly recurrent in noisy data are a common but difficult problem in present day computational biology. Cores of recurrent events (CORE) is a computational approach to solving this problem that is based on a formalized notion by which "core" intervals explain the observed data, where the number of cores is the "depth" of the explanation. Given that formalization, we implement CORE as a combinatorial optimization procedure with depth chosen from considerations of statistical significance. An important feature of CORE is its ability to explain data with cores of widely varying lengths. We examine the performance of this system with synthetic data, and then provide two demonstrations of its utility with actual data. Applying CORE to a collection of DNA copy number profiles from single cells of a given tumor, we determine tumor population phylogeny and find the features that separate subpopulations. Applying CORE to comparative genomic hybridization data from a large set of tumor samples, we define regions of recurrent copy number aberration in breast cancer.
在当今的计算生物学中,从嘈杂的数据中找到基因组中频繁出现的区域是一个常见但具有挑战性的问题。核心重复事件(CORE)是一种解决此问题的计算方法,它基于一种形式化的概念,即“核心”区间解释了观察到的数据,其中核心的数量是解释的“深度”。基于该形式化,我们将 CORE 实现为一种组合优化过程,其深度根据统计显著性的考虑因素进行选择。CORE 的一个重要特点是它能够用具有广泛变化长度的核心来解释数据。我们使用合成数据检查了该系统的性能,然后用实际数据提供了两个实用程序的演示。将 CORE 应用于给定肿瘤单细胞的 DNA 拷贝数谱的集合,我们确定肿瘤群体的系统发育并找到分离亚群的特征。将 CORE 应用于来自大量肿瘤样本的比较基因组杂交数据,我们定义了乳腺癌中经常出现的拷贝数异常区域。