School of Information Engineering, Wuhan University of Technology, Wuhan, Hubei, 430070, China.
Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA.
BMC Bioinformatics. 2019 Jan 3;20(1):1. doi: 10.1186/s12859-018-2565-8.
Genome-wide DNA copy number changes are the hallmark events in the initiation and progression of cancers. Quantitative analysis of somatic copy number alterations (CNAs) has broad applications in cancer research. With the increasing capacity of high-throughput sequencing technologies, fast and efficient segmentation algorithms are required when characterizing high density CNAs data.
A fast and informative segmentation algorithm, DBS (Deviation Binary Segmentation), is developed and discussed. The DBS method is based on the least absolute error principles and is inspired by the segmentation method rooted in the circular binary segmentation procedure. DBS uses point-by-point model calculation to ensure the accuracy of segmentation and combines a binary search algorithm with heuristics derived from the Central Limit Theorem. The DBS algorithm is very efficient requiring a computational complexity of O(n*log n), and is faster than its predecessors. Moreover, DBS measures the change-point amplitude of mean values of two adjacent segments at a breakpoint, where the significant degree of change-point amplitude is determined by the weighted average deviation at breakpoints. Accordingly, using the constructed binary tree of significant degree, DBS informs whether the results of segmentation are over- or under-segmented.
DBS is implemented in a platform-independent and open-source Java application (ToolSeg), including a graphical user interface and simulation data generation, as well as various segmentation methods in the native Java language.
全基因组 DNA 拷贝数变化是癌症发生和发展的标志性事件。体细胞拷贝数改变(CNA)的定量分析在癌症研究中有广泛的应用。随着高通量测序技术能力的不断提高,在描述高密度 CNA 数据时,需要快速有效的分割算法。
开发并讨论了一种快速而有效的分割算法 DBS(偏差二进制分割)。DBS 方法基于最小绝对误差原则,并受到基于循环二进制分割过程的分割方法的启发。DBS 使用逐点模型计算来确保分割的准确性,并结合了二分查找算法和源自中心极限定理的启发式算法。DBS 算法非常高效,计算复杂度为 O(n*log n),比其前身更快。此外,DBS 测量断点处两个相邻段均值变化点的幅度,其中变化点幅度的显著程度由断点处的加权平均偏差决定。因此,使用构建的显著程度二叉树,DBS 可以告知分割结果是过分割还是欠分割。
DBS 实现了一个独立于平台的开源 Java 应用程序(ToolSeg),包括一个图形用户界面和模拟数据生成,以及原生 Java 语言中的各种分割方法。