Biomedical Informatics, Dept of Informatics, University of Oslo, Oslo, Norway.
BMC Genomics. 2012 Nov 4;13:591. doi: 10.1186/1471-2164-13-591.
Cancer progression is associated with genomic instability and an accumulation of gains and losses of DNA. The growing variety of tools for measuring genomic copy numbers, including various types of array-CGH, SNP arrays and high-throughput sequencing, calls for a coherent framework offering unified and consistent handling of single- and multi-track segmentation problems. In addition, there is a demand for highly computationally efficient segmentation algorithms, due to the emergence of very high density scans of copy number.
A comprehensive Bioconductor package for copy number analysis is presented. The package offers a unified framework for single sample, multi-sample and multi-track segmentation and is based on statistically sound penalized least squares principles. Conditional on the number of breakpoints, the estimates are optimal in the least squares sense. A novel and computationally highly efficient algorithm is proposed that utilizes vector-based operations in R. Three case studies are presented.
The R package copynumber is a software suite for segmentation of single- and multi-track copy number data using algorithms based on coherent least squares principles.
癌症的进展与基因组不稳定性以及 DNA 的增益和缺失的积累有关。用于测量基因组拷贝数的工具种类繁多,包括各种类型的 array-CGH、SNP 阵列和高通量测序,这就需要一个连贯的框架,提供统一和一致的单轨和多轨分割问题处理。此外,由于出现了非常高密度的拷贝数扫描,因此需要高度计算效率的分割算法。
提出了一个用于拷贝数分析的综合 Bioconductor 包。该软件包为单样本、多样本和多轨分割提供了一个统一的框架,并且基于统计学上合理的惩罚最小二乘原则。在断点数量的条件下,估计值在最小二乘意义上是最优的。提出了一种新颖的、计算效率非常高的算法,该算法在 R 中利用基于向量的操作。呈现了三个案例研究。
R 软件包 copynumber 是一个用于使用基于一致最小二乘原则的算法分割单轨和多轨拷贝数数据的软件套件。